[developers] reforestation of gold profiles
oe at ifi.uio.no
Wed Sep 5 08:37:10 CEST 2012
‘lingo/erg/’ in LOGON is the most recent release version, currently 1111. ‘lingo/terg/’ is the trunk and tends to not provide well-defined treebanks (typically they are out-of-date, e.g. yet to be updated for the next release).
hence, my recommendation would be to use the 1111 grammar and treebanks. the original unthinned profiles are available as an optional SVN module, please see LogonExtras on the wiki.
our (re-)parse and update procedures are described (to some degree) on ErgReleases and in ‘uio/titan/README’; default [incr tsdb()] settings should be correct for this use case, although it is ages i have done an update interactively (rather than through the ‘parse’ or ‘redwoods’ scripts).
On Sep 5, 2012, at 8:17 AM, Francis Bond <bond at ieee.org> wrote:
> I would like to reforest (the opposite of thin) some gold profiles, so
> that I can do some parse ranking experiments. My understanding is
> that I:
> (i) parse the test suite with the same grammar
> (ii) update the new profile with the gold
> I have two questions about details:
> (i) which grammar (cpu) was used for the cb and sc0 profiles in
> the up-to-date logon distribution (or how to you find out)? Just
> looking at the run file did not make things very clear. If exactly
> the same grammar is not available, what should I do?
> (I am trying using 'cheap' but am getting several unknown words, which
> I had expect the unknown word handling to handle (at least it did in
> the gold profile). This does not bode well).
> (ii) which flags should I use when updating? I assume automatic
> update, but I am not sure if I want explicit or implicit ranks, or
> result identity or equivalence?
> Finally, if someone (Dan) should have a stash of unthinned profiles,
> please let me know.
> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
> Division of Linguistics and Multilingual Studies
> Nanyang Technological University
More information about the developers