[erg] New releaes of ERG
danf at csli.stanford.edu
Thu Nov 9 20:34:30 CET 2006
Hi colleagues -
I'm pleased to announce another release of the ERG, this time
simultaneously for DELPH-IN and for LOGON internally. The most
visible change is a modest but useful increase in the size of the
lexicon, including the addition of several thousand nouns and
adjectives (mostly) which appear with high frequency in the BNC
(100 times or more). These additions were guided by very helpful
word lists supplied by Yi Zhang of Saarbruecken, and he reports
that with these new additions we now see right at 50% coverage
of those items in the BNC with 20 or fewer tokens, without having
to resort to unknown-word guessing.
Also thanks to Yi Zhang for implementing in PET the same selective
unpacking algorithm used in the LKB. This gives dramatically
improved parse selection accuracy and also a welcome improvement
in parsing efficiency on longer sentences, even when recording
as many as a thousand candidate analyses.
The latest lexicon also contains some additional entries to help
with coverage on a small sample of chemistry abstract data from the
Cambridge SciBorg project. These specialized entries are tagged
as such in the lexical database, but included in the regular
Also, Ben Waldron has made some improvements in the treatment of
"ersatz" lexical entries (substitutions made by the finite-state
preprocessor) so more information is preserved in the MRSs via the
CARG value of the EP for such entries.
Note that you'll need to update your LKB and PET in order to use
this version of the ERG, since there have been several enhancements
in platform functionality that this version takes advantage of.
If you forget to update, you may well see complaints in your LKB
about a couple of files that end in ".vpm", or complaints in your
PET about trying to read a ".mem" file (the maxent model used for
the improved parse selection).
As always, feedback is welcome.
More information about the erg