[developers] Non-deterministic tokenisation with REPP
Berthold Crysmann
crysmann at ifk.uni-bonn.de
Fri Mar 11 01:56:57 CET 2011
Hi all,
is it currently possible to create alternate tokenisations with REPP?
With Pet chart mapping this is possible, so what I am looking for is an
LKB solution for the following problem: I need to combine adjacent
tokens into one but preserve the original tokenisation as well, in case
I am dealing with unrelated items.
Here's a concrete example: Hausa orthography separates off pronominal
affixes of verbs but not of nouns. To arrive at a more sound treatment
of pronominal affixes, I'd like to join putative pronominal affixes with
the words preceding them and let the grammar sort out the rest. But
unfortunately, I do also have to preserve the original tokenisation for
homographs...
I vaguely remember that something along these lines was possible at some
point earlier, so I'd be happy about any pointers.
BTW: waht is the current status of CM in LKB????
Cheers,
Berthold
More information about the developers
mailing list