[pet] problem with generic entries, suffixes and punctuation

Rebecca Dridan bec.dridan at gmail.com
Mon Oct 15 17:19:33 CEST 2007


Hi all,

I found a problem this week with the cheap add_generics() code and the 
fsr tokenisation method. Not sure why I haven't noticed before, except 
that the trigger is not that frequent...

Using -tok=fsr, the last token of a sentence I'm parsing is "retirees.", 
with the period included. When I use those same tokens, but add POS tags 
to get the default les, a $generic_pl_noun item is not created, because 
the suffix check fails - the last character is "." rather than "s". Not 
sure where the fix belongs, in the suffix checking code or elsewhere, 
but perhaps someone can take a look at it?

Thanks,

Rebecca



More information about the pet mailing list