[developers] 2020 release of ERG
danf at stanford.edu
Mon Apr 5 15:59:05 UTC 2021
I am pleased to announce a new release of the ERG, available here:
svn co http://svn.delph-in.net/erg/tags/2020
The three most significant features of this release are as follows:
1) Punctuation marks are now treated as separate tokens, improving interoperability of ERG analyses with other NLP tools. Thanks to Stephan Oepen for inspiring and collaborating in this conversion.
2) The treebanked profiles in erg/tsdb/gold are now fully updated based on the full-forest treebanking (fftb) tool, and the maxent parse selection model has been trained on the full 1.4 million word annotated corpus (minus those profiles held out to facilitate testing). This is roughly double the amount of training data used in training the model for the 2018 release. Thank to Woodley Packard for substantial help in adapting the fftb to cope with the effects of retokenizing punctuation marks, and with training the model.
3) Documentation strings are now enabled across all types and instances in the grammar, and supported by most DELPH-IN-related platforms. Thanks to several developers for coordinating and accommodating the necessary changes in TDL specifications. Compiling and/or using the 2020 release will be the same as in the past for the LKB (both classic and FOS), ACE, and Pydelphin. Note that since PET does not yet provide full support, this 2020 release of the grammar includes several duplicate files named "...-for-pet.tdl" from which the doc strings have been removed, so take appropriate precautions if you make local changes to any affected ERG source files. To compile the 2020 grammar for PET, do the following:
Then invoke the PET parser as follows:
cheap -cm -repp -default-les=all -packing -nsolutions=1 -mrs english
The 2020 release also includes a number of smaller improvements and bug fixes, many of which have been discussed on this list, in Github issues, and on the DELPH-IN discourse channel.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the developers