[developers] Non-deterministic tokenisation with REPP

Berthold Crysmann crysmann at ifk.uni-bonn.de
Mon Mar 14 13:15:54 CET 2011


On Sun, 2011-03-13 at 12:52 +0100, Berthold Crysmann wrote:
> Hi Stephan
> 
> thanks for your reply.
> 
> On Sat, 2011-03-12 at 19:28 +0100, Stephan Oepen wrote:
> > hi berthold,
> > 
> > REPP was designed to simplify the initial layer of tokenization,  
> > aiming for compatibility with PTB conventions.  hence, token-level  
> > ambiguity is reserved for the token mapping phase.  which, i am  
> > afraid, to date remains available only in PET.
> > 
> 
> I feared as much. 
> 
> > bad news from the HaG point of view, i realize.  in principle, for the  
> > LKB, you could go back to SPPP (which had some support for  
> > tokenization ambiguity); but that is officially unsupported and should  
> > eventually be purged from the code base.  
> 
> Ok. Would it be possible not to purge that code unless we have a
> replacement, i.e. chart-mapping in LKB?  
> 
> That'll give me a chance to make sure that development and runtime
> platform are able to process roughly the same kind of input. 
> 

It looks like SPPP depends on an external tokenizer, right? 

I wonder now what the state of affairs is with FSPP? Trying to load an
old version of GG, I do not get any chart entries for any input. The
token chart is also empty. 

The rules appear to load,  but do not seem to work as expected. Am I
missing something? 

Cheers, 

B    

> > or you could ‘micro- 
> > tokenize’ and simulate multi-token combination effects in phrase  
> > structure rules; but i suspect that might be inadequate for your  
> > orthographemic needs?
> > 
> 
> It is indeed. To my mind, the chart-mapping formalism we have now bears
> the potential to develop grammars which are much closer to one's
> linguistic theory and at the same time avoid clumsy helper features just
> to control competition between morphological and pseudo-morphological
> expression.   
> 
> > from my point of view, we should look for someone to implement chart  
> > mapping in the LKB, e.g. an MSc student.  students at UiO actually  
> > know Lisp (and some even the LKB), but this year none of them was  
> > interested in this specific project.  in case there were possible  
> > candidates elsewhere, it would seem reverse engineering the chart  
> > mapping formalism is doable, in principle, without insider access.  
> 
> That would certainly be the best, since it would enable me to get rid of
> the tone rules which I just keep for LKBs purposes and there only in
> parsing.  In Pet,  conversion of diacritics to autosegmental
> representation is entirely done by means of the CM mechanism. With
> generation, I have some functions that map internal tonal
> representations directly to diacritics in the output.  
> 
> Cheers, 
> 
> Berthold
> 
> > i  
> > have heard rumours about no less than two proprietary implementations  
> > of the DELPH-IN formalism which appear to have done it.
> > 
> 
> 
> > best, oe
> > 
> > 
> > On 11. mars 2011, at 01.56, Berthold Crysmann <crysmann at ifk.uni- 
> > bonn.de> wrote:
> > 
> > > Hi all,
> > >
> > > is it currently possible to create alternate tokenisations with REPP?
> > > With Pet chart  mapping this is possible, so what I am looking for  
> > > is an
> > > LKB solution for the following problem: I need to combine adjacent
> > > tokens into one but preserve the original tokenisation as well, in  
> > > case
> > > I am dealing with unrelated items.
> > >
> > > Here's a concrete example: Hausa orthography separates off pronominal
> > > affixes of verbs but not of nouns. To arrive at a more sound treatment
> > > of pronominal affixes, I'd like to join putative pronominal affixes  
> > > with
> > > the words preceding them and let the grammar sort out the rest. But
> > > unfortunately, I do also have to preserve the original tokenisation  
> > > for
> > > homographs...
> > >
> > > I vaguely remember that something along these lines was possible at  
> > > some
> > > point earlier, so I'd be happy about any pointers.
> > >
> > > BTW: waht is the current status of CM in LKB????
> > >
> > > Cheers,
> > >
> > > Berthold
> > >
> > >
> > >
> > >
> 





More information about the developers mailing list