[pet] punctuation in chart mapping
Stephan Oepen
oe at ifi.uio.no
Sat Aug 14 11:20:49 CEST 2010
hi xuchen,
please see the page ErgTokenization on the DELPH-IN wiki. our
recommended strategy is to actually use the REPP preprocessing layer
that is provided by the grammar; as of today, that is easiest to apply
when combining PET with [incr tsdb()]—as is available ready-to-run in
the LOGON tree, for example.
best wishes, oe
On 14. aug. 2010, at 11.10, Xuchen Yao <xuchen at coli.uni-saarland.de>
wrote:
> Hi,
>
> In the FSC input page how to deal with punctuations is not stated:
>
> http://wiki.delph-in.net/moin/PetInputFsc
>
> For a sample sentence, such as:
>
> Rome was ruled by a council called the "Senate".
>
> If "Senate" (including quotes) is tokenized/tagged as follows:
>
> " Senate "
> `` NNP '' (two 's, rather than 1 single ")
>
> Then PET complains no lexicon entries for "." [. 1]
> "Senate" [NNP 1] """ [" 1] etc.
>
> If it is tokenized/tagged as follows:
>
> " Senate"
> `` NNP
>
> Pet complains that:
>
> Chart is not well-formed after chart mapping. This is probably a bug
> in the grammar.
>
> If it is tokenized/tagged as follows:
>
> "Senate "
> NNP ''
>
> Then Pet parses.
>
> I'm a bit confused of how to deal with double quotes. It seems the
> easiest way is to just remove them. But is this the correct way?
> Thanks in advance for explaining this!
>
> Xuchen
More information about the pet
mailing list