[developers] New ERG with improved tokenization/preprocessing for PET

Timothy Baldwin tim at csse.unimelb.edu.au
Tue May 26 01:26:42 CEST 2009

<SNIP> (that's a big snip!)

> > i think an actual solution would require a tool like morpha (which is
> > part of pre-processing in RASP, i believe), adapted for PTB tags and
> > american english.  one could argue this /should/ be part of our input
> > pre-processing prior to parsing, but that is not an option right now.
> Why is it not an option?  I thought Tim has something like this
> already.  If not  we could  make one, or maybe see if the RASP project
> has one squirreled away somewhere.  If we can't find anything better,
> I volunteer to make one: under the assumption that irregular cases
> should go in the ERG proper, I believe it would be reasonably cheap to
> build.

Tim does indeed have such a thing if needed. Or were you referring to more
technical reasons for it not being an option, Stephan?


More information about the developers mailing list