[developers] How do I make PET deal with unknown lexical items?

Thu Jul 31 20:20:50 CEST 2008

Thanks.  I managed to get SMAF working in PET with the example sentence on
the Wiki, but for the time being I'm just looking for the easiest way to get
robustly from text to RMRS, so I'll take a look at HoG.

On Wed, Jul 30, 2008 at 8:28 AM, R. Bergmair <rbergmair at acm.org> wrote:

> On Tue, 29 Jul 2008, Bill McNeill (UW) wrote:
>
>  no lexicon entries for:
>>   "faq" [NP1 ]
>>   "." [. ]
>>
>
> In the ERG directory there is a file called pet/common.set,
> which contains a setting "posmapping". Make sure that there
> is an entry mapping "NP1" to a generic type, for example
> $genericname. This is commented out in the default
> configuration.
>
> You generally might want to look into using SMAF as an
> input format. I'm not sure what the policy of the PET
> maintainers is, regarding the YY input format. I
> understand SMAF is supposed to be the "new thing" :)
>
> Concerning the ".", this might be because the example on
> YY input dates from what Dan calls the "pre-punctuation era" :)
> First, the ERG used to strip out punctuation in a
> preprocessing step, so stuff could be tokenized like
> this: |Incidentally|,|Xavier|is|tall|.|
> Now it should be like this: |Incidentally,|Xavier|is|tall.|
>
> Generally, part of the problem of giving preprocessed input
> to PET is that you have to know the tokenization expected
> by the ERG, and you have to have tags to correspond to that
> tokenization, i.e. what you want to do is to run the FSPP
> preprocessor in a separate step and use a POS tagger trained
> on that tokenization, or at least you will have to map stuff
> to the right tokenization.
>
> An alternative is to use Yi Zhang's type prediction code.
> You will need a specifically instrumented version of the ERG
> for this. This is all kind of inofficial and experimental,
> though. You'll find a grammar at
>
>  http://www.coli.uni-saarland.de/~yzhang/files/erg-cvs20080417.tar.bz2<http://www.coli.uni-saarland.de/%7Eyzhang/files/erg-cvs20080417.tar.bz2>
>
> and you can then use the undocumented "-predict-les" option
> to cheap.
>
> If you need to have control over such preprocessing steps,
> playing around with this sort of stuff is fair enough, but
> if all you're trying to do is to robustly parse text and
> obtain RMRSes, you probably want to look into using HoG as
> a middleware to handle these things for you. If you're in
> the mood for experimenting with some undocumented code, I
> can also send you my "PyRMRS" python library which can also
> handle this stuff.
>
>
> regards,
>
> Richard
>

-- 
Bill McNeill
http://staff.washington.edu/billmcn/index.shtml
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20080731/5b1f5707/attachment.html>