[developers] Re: Ranking parses

Stephan Oepen oe at csli.Stanford.EDU
Tue May 3 01:18:44 CEST 2005

hi petter,

> There are some students here at NTNU who use my grammar in a small
> question-answering application and they have a problem with picking the
> right parse. Is there an easy way to use an itsdb treebank (if we build
> it) to rank the parses?

apologies for the long turn-around time.  yes, it is possible to build
a (small-ish, maybe) treebank using [incr tsdb()] facilities, and then
train a Maximum Entropy parse selection model on it.  i would estimate
that 500 -- 1500 annotated sentences should be sufficient to make the
model perform reasonably well (on a coherent domain).  also, PET comes
with good support for parse ranking using ME models, while the LKB has
most of the required functionality, but the interface is a little more
involved (and less tested).  finally, estimating the model once there
is training data requires that the `estimate' package by Rob Malouf is
installed locally, which in turn requires some third-party libraries.
--- all doable, but i would hesitate to call the procedure `easy' :-{.
needless to say, hardly any of this is documented, although i know at
least francis is eager to write up what is known about the process ...

should you not feel discouraged at this point, i would be happy to help
remotely, but realistically do not have that much time available.  erik
knows everything about the training aspects, once you had the data, and
generally enhancing NorSource with an initial ME model would seem like
a nice thing.  would you have people to do the annotation?  as the main
developer of the grammar, you should get someone else to annotate :-).

                                                   all the best  -  oe

+++ Universitetet i Oslo (ILN); Boks 1102 Blindern; 0317 Oslo; (+47) 2285 7989
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at csli.stanford.edu; oe at hf.uio.no; stephan at oepen.net ---

More information about the developers mailing list