[pet] punctuation in chart mapping

Stephan Oepen oe at ifi.uio.no
Sat Aug 14 11:20:49 CEST 2010


hi xuchen,

please see the page ErgTokenization on the DELPH-IN wiki.  our  
recommended strategy is to actually use the REPP preprocessing layer  
that is provided by the grammar; as of today, that is easiest to apply  
when combining PET with [incr tsdb()]—as is available ready-to-run in  
the LOGON tree, for example.

best wishes, oe



On 14. aug. 2010, at 11.10, Xuchen Yao <xuchen at coli.uni-saarland.de>  
wrote:

> Hi,
>
> In the FSC input page how to deal with punctuations is not stated:
>
> http://wiki.delph-in.net/moin/PetInputFsc
>
> For a sample sentence, such as:
>
> Rome was ruled by a council called the "Senate".
>
> If "Senate" (including quotes) is tokenized/tagged as follows:
>
> "    Senate     "
> ``    NNP    '' (two 's, rather than 1 single ")
>
> Then PET complains no lexicon entries for   "." [. 1]     
> "Senate" [NNP 1] """ [" 1] etc.
>
> If it is tokenized/tagged as follows:
>
> "    Senate"
> ``    NNP
>
> Pet complains that:
>
> Chart is not well-formed after chart mapping. This is probably a bug  
> in the grammar.
>
> If it is tokenized/tagged as follows:
>
> "Senate    "
> NNP    ''
>
> Then Pet parses.
>
> I'm a bit confused of how to deal with double quotes. It seems the  
> easiest way is to just remove them. But is this the correct way?  
> Thanks in advance for explaining this!
>
> Xuchen




More information about the pet mailing list