[developers] Non-deterministic tokenisation with REPP

Berthold Crysmann crysmann at ifk.uni-bonn.de
Fri Mar 11 01:56:57 CET 2011


Hi all, 

is it currently possible to create alternate tokenisations with REPP?
With Pet chart  mapping this is possible, so what I am looking for is an
LKB solution for the following problem: I need to combine adjacent
tokens into one but preserve the original tokenisation as well, in case
I am dealing with unrelated items. 

Here's a concrete example: Hausa orthography separates off pronominal
affixes of verbs but not of nouns. To arrive at a more sound treatment
of pronominal affixes, I'd like to join putative pronominal affixes with
the words preceding them and let the grammar sort out the rest. But
unfortunately, I do also have to preserve the original tokenisation for
homographs... 

I vaguely remember that something along these lines was possible at some
point earlier, so I'd be happy about any pointers. 

BTW: waht is the current status of CM in LKB????

Cheers, 

Berthold


  




More information about the developers mailing list