[developers] Scoring ranking results in the Logon System

Stephan Oepen oe at ifi.uio.no
Sat Feb 7 23:42:39 CET 2009


hi bill (with the usual apologies for a late reply),

> 1. Can the grid search be run in parallel?  I understand that the
> feature cachine and training steps have to be run in a single
> process, but it seems like the grid search (the step performed by the
> grid.lisp) could be run in parallel, with one grid point per
> computer.  I have the parallel computing resources to do this, so if
> parallelization works in this framework it would be a huge time
> saver.

yes, certainly.  erik used to break up `grid.lisp' into `grid.0.lisp',
`grid.1.lisp', and so forth, according to grandparenting levels.  the
only thing to watch out for in running multiple such grids in parallel
is avoiding parallel compilation of the Lisp sources, i.e. making sure
that all `.fasl' files are up-to-date before you start the grid.  but i
suspect you are using run-time binaries anyway (the `--binary' option
to the `load' script), where this should be no concern.

the context caches (directories below the source profiles, whose names
start in `cc.') are per-process, hence it hardly makes sense to break
up the iteration over hyper-parameters (variance and tolerance) across
runs.  but any of the others should be okay to distribute.

each experiment is characterized by its unique profile name (each with
the :prefix argument to batch-experiment() in it, plus all the values
for feature selection and hyper-parameters).  the experimentation code
will look for each such directory before starting a new experiment, and
it should never erase or overwrite data.  thus, if the target directory
exists and contains a valid profile, the experiment will be skipped.

> 2. Are there scripts to perform scoring?  After grid.lisp has
> completed, it's not clear to me how to collect accuracy scores or
> other statistics.  Of course I could write code to do this myself by
> groveling the score and preference files, but it seems like someone
> would have already written this.

you may of course end up inspecting the `fold' relations yourself, but
to get started, there is the function summarize-folds().  once you have
a set of target profiles (e.g. from completing the grid) in the current
[incr tsdb()] database home, a call like

  (summarize-folds :output "/tmp/folds" :pattern "\\[jhpstg\\]")

will read the `fold' relation from all profiles that match the pattern
(in this case, corresponding to the default :prefix value), average all
folds within one profile, and report the average accuracy and standard
deviation (and a few more figures) into `/tmp/folds'.  have a look at
the function definition in `learner.lisp' for more options.

note that the default [incr tsdb()] database home is not the directory
with the full Redwoods treebanks, i.e. you need to point [incr tsdb()]
to the right path before doing summarize-folds().

> 3. What is the state of the art for ERG parse ranking?  What are the
> learning method, features, the data sets, and the accuracy?  Is this
> documented anywhere on the logon tree?  Is there a paper that
> outlines it?

i imagine you are aware of the various Redwoods papers, specifically:

  http://emmtee.net/bib/Tou:Man:Fli:05.bib

these early experiments were on VerbMobil data, where sentence length
and average ambiguity were well below what we see in the LOGON corpus
(which, in turn, has shorter utterances than the WSJ or WeScience).  i
did a semi-systematic parse selection run in 2007, for our joint paper
with zhang yi and john carroll:

  http://www.aclweb.org/anthology-new/W/W07/W07-2207.pdf

for all i know, the accuracy reported there (not quite 60 percent exact
match) is representative for the current set of features and the LOGON
corpus.  i have a suspicion that we should be able to do better, and i
would be happy to discuss half-baked ideas about additional features at
some point.  but that would presuppose that you were planning to go in
yourself (i.e. extend the code in `features.lisp') and add new featues?

                                                   best wishes  -  oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at ifi.uio.no; oe at csli.stanford.edu; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



More information about the developers mailing list