[developers] basic question on unknown words

Stephan Oepen oe at ifi.uio.no
Thu Dec 21 21:33:13 CET 2017


hi paul,

the ERG distinguishes two types of unknown input tokens, named
entities and PoS-activated generic lexical entries.  for the latter to
work, you need to invoke PoS tagger, e.g.

0 oe at mv (~/src/logon) 11 $ echo "Haley and 4711 are unknown named
entities." | ./bin/cheap -repp -cm -default-les=all -packing
-nsolutions=1 lingo/erg/english.grm
reading `lingo/erg/pet/english.set'... including
`lingo/erg/pet/common.set'... including `lingo/erg/pet/global.set'...
including `lingo/erg/pet/repp.set'... including
`lingo/erg/pet/mrs.set'... loading `lingo/erg/english.grm'
(ERG (1214)) reading ME model `lingo/erg/redwoods.mem'... [3643349 features]
read-vpm(): reading file `semi.vpm'.
95873 types in 11 s

(1) `haley and 4711 are unknown named entities.' [0] --- 1
(0.10|0.10s) <114:524> (50802.9K) [0.1s]
0 oe at mv (~/src/logon) 12 $ echo "there is an unknown wordd." |
./bin/cheap -repp -cm -tagger -default-les=all -packing -nsolutions=1
lingo/erg/english.grm
reading `lingo/erg/pet/english.set'... including
`lingo/erg/pet/common.set'... including `lingo/erg/pet/global.set'...
including `lingo/erg/pet/repp.set'... including
`lingo/erg/pet/mrs.set'... loading `lingo/erg/english.grm'
(ERG (1214)) reading ME model `lingo/erg/redwoods.mem'... [3643349 features]
read-vpm(): reading file `semi.vpm'.
95873 types in 11 s

(1) `there is an unknown wordd.' [0] --- 1 (0.07|0.08s) <85:279>
(37428.2K) [0.0s]

—in case you are not familiar with the tagger integration, please have
a look at the comments around the ‘taggers’ configuration in
‘common.set’.  while TnT (bundled in redistributable binary form with
the LOGON tree but not really open source) has been the default tagger
for a decade or so, it is straightforward to substitute another tagger
(e.g. HunPoS or GENIATagger) and, if need be, make it mimic the
tab-separated TnT input and output protocol by virtue of a wrapper
script.

best wishes, oe


On Thu, Dec 21, 2017 at 8:54 PM,  <paul at haleyai.com> wrote:
> I’ve reviewed the documentation but just can’t seem to get unknown words out
> of PET using the ERG.
>
>
>
> Does anyone know the proper incantation?
>
>
>
> Thank you, and,
>
>
>
> Season’s Greetings!
>
>
>
> Paul
>
>
>
>
>
> root at c248d57fdceb:/ERG# cheap -nsolutions=1 -packing -cm -default-les=all
> -repp english.grm
>
> reading `pet/english.set'... including `pet/common.set'... including
> `pet/global.set'... including `pet/repp.set'... including `pet/mrs.set'...
> loading `english.grm'
>
> (ERG (trunk)) reading ME model `redwoods.mem'... [3720311 features]
>
> 99034 types in 7.1 s
>
>
>
> this is an unknown wordd
>
> no lexicon entries for:
>
>         "wordd" []



More information about the developers mailing list