[pet] handling of unknown lexical items

Stephan Oepen oe at ifi.uio.no
Wed Oct 26 23:13:38 CEST 2011


TnT and two methods of interfacing to PET are pre-packaged in the so-called LOGON tree, please see LogonTop on the wiki.  i'd also recommend you upgrade to the latest ERG release (1010).  in the LOGON environment (off the SVN trunk), adding -tagger to the cheap command line should enable the same unknown word handling you see in the on-line demo.

rebecca raises an important point, unknown words without PoS tagging prior to parsing is not really a configuration to be recommended.

cheers, oe


On 26. okt. 2011, at 22:42, John Stewart <cane.cubo at gmail.com> wrote:

> Rebecca,
> 
> Thank you, that is helpful.  I'm not using a tagger, and I have cheap
> 0.99.14svn_cm and the ERG (1004) grammar.  I see from
> http://www.coli.uni-saarland.de/~thorsten/tnt/ that TnT has a more
> restrictive license.  Would any off-the-shelf tagger that produces
> Penn tags work fine?
> 
> Best,
> 
> jds
> 
> On Wed, Oct 26, 2011 at 3:12 PM, Rebecca Dridan <bec.dridan at gmail.com> wrote:
>> Are you using any sort of POS tagger to annotate the input to PET? I assume
>> the online demo is using the TnT tagger, which is the default.  How you feed
>> those into the parser depends a bit on which version of the parser and the
>> grammar you are using, but you'll definitely want POS-tagged input to get
>> decent unknown word handling.
>> 
>> Rebecca
>> 
>> On 26/10/11 8:48 PM, John Stewart wrote:
>>> 
>>> Hello,
>>> 
>>> I am trying to reproduce the behaviour of the online demo using the
>>> command-line PET + ERG system, but having various troubles.  One is
>>> with unknown words.  For  the sentence
>>> 
>>> (1)  ugo kissed pilar
>>> 
>>> The online demo returns _ugo/nn_u_unknown and _pilar/nn_u_unknown ,
>>> which is correct.
>>> 
>>> Using the command-line tool as follows:
>>> 
>>>> cheap -default-les=all -verbose=3 -mrs english.grm
>>> 
>>> I get the surprising output:
>>> 
>>> (1011 np_frg_c 0 0 3 [root_frag]
>>>   (1007 hdn_bnp_c 0 0 3
>>>     (1003 n-hdn_cpd_c 0 0 3
>>>       (5 gen_generic_noun/n_-_mc-ns-g_le 0 0 1 []
>>>         (1 "ugo" 0 0 1<0:1>))
>>>       (1000 hdn-n_prnth_c 0 1 3
>>>         (610 generic_pl_noun/n_-_c-pl-unk_le 0 1 2 []
>>>           (2 "kissed" 0 1 2<1:2>))
>>>         (865 generic_pl_noun_ne/n_-_c-pl-gen_le 0 2 3 []
>>>           (3 "pilar" 0 2 3<2:3>))))))
>>> 
>>> So an NP fragment.  Incidentally I'm unsure how to read the leaf
>>> types, as the format, with "/", seems to not match the templates
>>> documented at http://moin.delph-in.net/ErgLeTypes  But in any case,
>>> plural nouns are an incorrect default (I get worse results with
>>> -default-les=traditional).  Are there cheap switches that will yield
>>> the better output given by the online demo?
>>> 
>>> Thanks for any suggestions.
>>> 
>>> jds
>>> 
>> 
>> 
> 




More information about the pet mailing list