[developers] Non-deterministic tokenisation with REPP
crysmann at ifk.uni-bonn.de
Fri Mar 11 14:18:35 CET 2011
On Fri, 2011-03-11 at 01:56 +0100, Berthold Crysmann wrote:
> Hi all,
> is it currently possible to create alternate tokenisations with REPP?
> With Pet chart mapping this is possible, so what I am looking for is an
> LKB solution for the following problem: I need to combine adjacent
> tokens into one but preserve the original tokenisation as well, in case
> I am dealing with unrelated items.
> Here's a concrete example: Hausa orthography separates off pronominal
> affixes of verbs but not of nouns. To arrive at a more sound treatment
> of pronominal affixes, I'd like to join putative pronominal affixes with
> the words preceding them and let the grammar sort out the rest. But
> unfortunately, I do also have to preserve the original tokenisation for
> I vaguely remember that something along these lines was possible at some
> point earlier, so I'd be happy about any pointers.
I had a look at the code in repp.lsp and around line 151 the + (augment)
operator is replaced with ! (substitute).
Is there any way to get the behaviour from x-preprocessor ? Or do I need
to switch back to that older preprocessor?
Thanks for any advice
> BTW: waht is the current status of CM in LKB????
More information about the developers