[developers] [erg] jul-07 release of ERG
Stephan Oepen
oe at ifi.uio.no
Tue Jul 17 19:00:37 CEST 2007
hei!
here is a follow-up with some background to what dan emailed today:
> I am pleased to announce the release of the "jul-07" version of the
> ERG, a minor update to the first message-free version released in
> March, some bug fixes and expanded lexical coverage for several
> additional treebanked corpora, [...]
>
> Note further that the treebanks in erg/gold will only behave properly
> with [incr tsdb()] once you have an up-to-date version of the LKB,
> but don't rush - that compatible version will not be in CVS for
> another few days yet. I'll announce when it's ready. In the
> meantime, you should be able to use this ERG for everything except
> grammar profiling and treebanking.
regarding treebanks, the relevant changes are all in [incr tsdb()] (not
the LKB), but as both reside in the same CVS repository, i recommend to
prepare bringing /everything/ up-to-date (also, dan has commited a few
changes to LKB code, which he plans to announce separately).
my treebank change is in the derivation format recorded in the profile
database (in the `result' relation). the new, extended format (dubbed
UDF 1.2) includes the start symbol of the grammar used to license each
derivation, e.g.
(root_strict
(49 subjh -1.03461 0 2
(46 proper_np -1.15336 0 1
(45 sing_noun_irule -0.862381 0 1 (3 kim 0 0 1 ("kim" 0 1))))
(48 punct_period_orule 0.121598 1 2
(47 third_sg_fin_verb_orule 0.061442 1 2
(9 sleep_v1 0 1 2 ("sleeps." 1 2))))))
compared to the earlier format (UDF 1.1), the `root_strict' node at the
top is new, thus whoever reads (or writes) derivations in [incr tsdb()]
profiles needs to be aware of this change.
as always, old code will not be able to read new profiles, but new code
is backwards compatible to old profiles (back to around 1997). hence,
to use [incr tsdb()] on the latest ERG profiles, you will need to move
to the latest code base, /once/ i release it later this week. however,
running this latest ERG, including creating your own new profiles, can
still be done with older versions of [incr tsdb()], PET, or the LKB.
to write out UDF 1.2 derivations, PET also needs to be changed. i plan
to check in both my [incr tsdb()] and PET updates later this week.
all best - oe
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++ --- oe at ifi.uio.no; oe at csli.stanford.edu; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
More information about the developers
mailing list