[erg] jul-07 release of ERG
danf at csli.stanford.edu
Tue Jul 17 00:27:41 CEST 2007
Dear colleagues -
I am pleased to announce the release of the "jul-07" version of the ERG,
a minor update to the first message-free version released in March, with
some bug fixes and expanded lexical coverage for several additional
treebanked corpora, including
(1) FraCaS - Ann Copestake's student Richard Bergmair has collected all of
the examples sentences from FraCaS (cf. www.cogsci.ed.ac.uk/~fracas);
the treebank for all of these 325 items is in erg/gold/fracas.
(2) Senseval 2-4 - Tim Baldwin and his student David Martinez have provided
an ERG-compatible tokenization for the data used in the Senseval workshops
2, 3, and 4, and the treebanks for these three data sets are in
erg/gold/seval2, seval3, and seval4. Note that the ERG currently produces
good analyses for 80% of this data, as follows:
Profile Total Good ERG
# items analyses
Senseval-2 242 198
Senseval-3 327 268
Senseval-4 135 99
total 704 567
(3) SciBorg - Ann Copestake and colleagues in the SciBorg project at
Cambridge have prepared some data sets from scientific texts in
chemistry which the ERG is being applied to, though this data cannot
be distributed, at least not yet.
(4) Acrolinx - Ulrich Callmeier and colleagues at the Acrolinx software
company in Berlin have data for controlled language checking to which
the ERG is being applied as part of an in-house R&D effort, but this
data also cannot be distributed.
Note that the maximum entropy model released with this version has not yet
been updated to reflect the current grammar, but should work reasonably well
until the new model has been built and validated.
Note further that the treebanks in erg/gold will only behave properly with
[incr tsdb()] once you have an up-to-date version of the LKB, but don't rush -
that compatible version will not be in CVS for another few days yet. I'll
announce when it's ready. In the meantime, you should be able to use this
ERG for everything except grammar profiling and treebanking.
I look forward to hearing of your experiences with this new release.
More information about the erg