[developers] Non-deterministic tokenisation with REPP

Berthold Crysmann crysmann at ifk.uni-bonn.de
Fri Mar 11 14:18:35 CET 2011

On Fri, 2011-03-11 at 01:56 +0100, Berthold Crysmann wrote:
> Hi all, 
> is it currently possible to create alternate tokenisations with REPP?
> With Pet chart  mapping this is possible, so what I am looking for is an
> LKB solution for the following problem: I need to combine adjacent
> tokens into one but preserve the original tokenisation as well, in case
> I am dealing with unrelated items. 
> Here's a concrete example: Hausa orthography separates off pronominal
> affixes of verbs but not of nouns. To arrive at a more sound treatment
> of pronominal affixes, I'd like to join putative pronominal affixes with
> the words preceding them and let the grammar sort out the rest. But
> unfortunately, I do also have to preserve the original tokenisation for
> homographs... 
> I vaguely remember that something along these lines was possible at some
> point earlier, so I'd be happy about any pointers. 

I had a look at the code in repp.lsp and around line 151 the + (augment)
operator is replaced with ! (substitute). 

Is there any way to get the behaviour from x-preprocessor ? Or do I need
to switch back to that older preprocessor?  

Thanks for any advice


> BTW: waht is the current status of CM in LKB????
> Cheers, 
> Berthold

More information about the developers mailing list