[developers] basic question on unknown words

paul at haleyai.com paul at haleyai.com
Thu Dec 21 21:47:19 CET 2017


Thank you Stephan.  Sending in tags (e.g., in FSC input) works great.  For some reason, I thought you could get unknowns without tags once upon a time (e.g., using -default-les=traditional).

I reread http://moin.delph-in.net/PetInput (the basics and chart-mapping sections with regard to unknown word handling) before sending my email but did not come away with the crystal clear understanding you have now given me. 

Thank you,
Paul

-----Original Message-----
From: Stephan Oepen [mailto:oe at ifi.uio.no] 
Sent: Thursday, December 21, 2017 3:33 PM
To: Paul Haley <paul at haleyai.com>
Cc: developers at delph-in.net
Subject: Re: [developers] basic question on unknown words

hi paul,

the ERG distinguishes two types of unknown input tokens, named entities and PoS-activated generic lexical entries.  for the latter to work, you need to invoke PoS tagger, e.g.

0 oe at mv (~/src/logon) 11 $ echo "Haley and 4711 are unknown named entities." | ./bin/cheap -repp -cm -default-les=all -packing
-nsolutions=1 lingo/erg/english.grm
reading `lingo/erg/pet/english.set'... including `lingo/erg/pet/common.set'... including `lingo/erg/pet/global.set'...
including `lingo/erg/pet/repp.set'... including `lingo/erg/pet/mrs.set'... loading `lingo/erg/english.grm'
(ERG (1214)) reading ME model `lingo/erg/redwoods.mem'... [3643349 features]
read-vpm(): reading file `semi.vpm'.
95873 types in 11 s

(1) `haley and 4711 are unknown named entities.' [0] --- 1
(0.10|0.10s) <114:524> (50802.9K) [0.1s]
0 oe at mv (~/src/logon) 12 $ echo "there is an unknown wordd." | ./bin/cheap -repp -cm -tagger -default-les=all -packing -nsolutions=1 lingo/erg/english.grm reading `lingo/erg/pet/english.set'... including `lingo/erg/pet/common.set'... including `lingo/erg/pet/global.set'...
including `lingo/erg/pet/repp.set'... including `lingo/erg/pet/mrs.set'... loading `lingo/erg/english.grm'
(ERG (1214)) reading ME model `lingo/erg/redwoods.mem'... [3643349 features]
read-vpm(): reading file `semi.vpm'.
95873 types in 11 s

(1) `there is an unknown wordd.' [0] --- 1 (0.07|0.08s) <85:279>
(37428.2K) [0.0s]

—in case you are not familiar with the tagger integration, please have a look at the comments around the ‘taggers’ configuration in ‘common.set’.  while TnT (bundled in redistributable binary form with the LOGON tree but not really open source) has been the default tagger for a decade or so, it is straightforward to substitute another tagger (e.g. HunPoS or GENIATagger) and, if need be, make it mimic the tab-separated TnT input and output protocol by virtue of a wrapper script.

best wishes, oe


On Thu, Dec 21, 2017 at 8:54 PM,  <paul at haleyai.com> wrote:
> I’ve reviewed the documentation but just can’t seem to get unknown 
> words out of PET using the ERG.
>
>
>
> Does anyone know the proper incantation?
>
>
>
> Thank you, and,
>
>
>
> Season’s Greetings!
>
>
>
> Paul
>
>
>
>
>
> root at c248d57fdceb:/ERG# cheap -nsolutions=1 -packing -cm 
> -default-les=all -repp english.grm
>
> reading `pet/english.set'... including `pet/common.set'... including 
> `pet/global.set'... including `pet/repp.set'... including `pet/mrs.set'...
> loading `english.grm'
>
> (ERG (trunk)) reading ME model `redwoods.mem'... [3720311 features]
>
> 99034 types in 7.1 s
>
>
>
> this is an unknown wordd
>
> no lexicon entries for:
>
>         "wordd" []





More information about the developers mailing list