[developers] pet-input-chart punctuation-characters

Francis Bond fcbond at gmail.com
Fri Feb 10 13:04:31 CET 2006


G'day,

> It's correct that PIC does not check for punctuation characters. This
> is because PIC was meant for .external. preprocessing, while the
> punctuation-chars are used in the internal (simple) tokenizer. If there
> are tokens meant to be ignored by cheap in the PIC, the 'constant'
> value of the corresponding 'w' XML item should get the value "yes",
> which will instruct cheap to skip it.

The problem was that the internal processor was not skipping
punctuation if it came in through a PIC (or YY mode).   However, the
knowledge of what punctuation the grammar knows about and should parse
rather than skip is grammar internal, and changes from version to
version.  Therefore we don' t want to have to make the decision in the
module that creates the PIC.

> Since i hope we will agree on a unified input scheme in Spain, i hope
> this inconvenience will go away soon.

That will be great.

--
Francis Bond  <www.kecl.ntt.co.jp/icl/mtg/members/bond/>
NTT Communication Science Laboratories | Machine Translation Research Group




More information about the developers mailing list