[developers] New ERG with improved tokenization/preprocessing for PET
danf at stanford.edu
Sun Apr 12 05:59:43 CEST 2009
I am pleased to announce a new release of the ERG, tagged '0902' (year/month -- it's taken us a little while to package and test it :). This release benefits greatly from a new tokenization component of the ERG, mainly developed by Stephan Oepen, making use of the newly available chart-mapping facility that Peter Adolphs and Stephan have been developing. One immediate benefit of this new facility is that when the grammar is run with PET, all of the preprocessing (tokenization adjustments, punctuation, dates, measure-NPs, numbers, etc.) is now done internally, so it should be much easier to use the grammar/parser as a module in applications. Unknown-word handling is also well supported now, including both proper names and open-class words, and assumes the TnT tagset, with generic entries created on the fly triggered by these tags for verbs, nouns, adjectives, and adverbs. Characterization information is now preserved in the MRSs produced, and contentful predicate names are introduced for unknown words. For emerging documentation on this new facility, see
In this release, you'll find some additional treebank profiles in the 'gold' subdirectory, namely the first four sections of the emerging WeScience corpus which consists of 100 Wikipedia articles on computational linguistics. We expect to complete the treebanking of the full corpus (now 25% complete) by this summer.
To get this new version of the ERG, you have several choices:
If you use the 'install' script from the LinGO download site
then instead of just doing what it says at the top of the 'Automated installation' page, you should say
bash install --test --home ~/delphin
which will get you this most recent 'test' release (rather than the more stable 'latest' release from about a year ago).
If you download the grammar directly from the ERG page via the 'download' link, then you'll get this new 'test' version now.
And if you use SVN to get the grammar, then to get this release version, you should do the following:
svn co http://svn.delph-in.net/erg/tags/0902
since the SVN head revision always runs a little in front of the well-tested release.
Your reactions and critique will be welcome, as always.
More information about the developers