[developers] PET and edge counts

Fri Apr 21 11:45:34 CEST 2006

hi again!

> I have yet another question for you relating to this experiment Yi
> has been valiantly running for the past week and a bit relating to
> lexical acquisition: is there any way of distinguishing between parse
> failures (i.e. there genuinely not being a spanning parse for the
> input) and resource failures (i.e. PET having run out of edges) in
> the profiles generated by the fine system? Basically, Yi has run the
> ERG with different variants of the lexicon over updated Redwoods
> data, and we would like to be able to differentiate between parse and
> resource failures in the outputs he has at hand.

apologies, probably a little late now!  yes, assuming you are running
under [incr tsdb()] control, the distinction is reported and recorded:
whenever PET exhausts either its total chart size limit or the separate
limit on edges used in unpacking (these two are separate because edges
are more `costly' during forest creation), the `error' field from the
`parse' relation will be non-empty.  how exactly are you calling PET?

the chart size limit is controlled by the [incr tsdb()] parameter

  (defparameter *tsdb-maximal-number-of-edges* 100000)

which corresponds to `Process | Variables | Chart Size Limit'.  i take
it you have ambiguity packing turned on (the cheap `-packing' option)
and limit the number of results that get reported to [incr tsdb()] to
something rational, e.g.

  (defparameter *tsdb-maximal-number-of-results* 1000)

this corresponds to `Process | Variables | Result Storage Limit'.  the
third potentially relevant parameter is

  (defparameter *tsdb-maximal-number-of-analyses* 0)

or `Process | Variables | Analyses Limit', which could be used to ask
for n-best parsing.  i'm not entirely sure about the current state of
play (and should go back to my ACL|COLING reviews now rather than read
PET sources), but i believe there is no useful n-best mode in current
PETs.  it used to be agenda-driven best-first parsing using local ME
scores, which obviously neither guarantees finding the globally best
parse, nor does it co-exist with ambiguity packing.  ideally, someone
would implement selective unpacking in PET, but in the meantime i guess
constructing the packed forest, unpacking exhaustively, and scoring all
results is the best you can do.  this, for the ERG at least, used to be
quite a bit faster than n-best search without packing.  but obviously,
this will vary with the grammar, test data, and ME model ...

                                                         cheers  -  oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2285 7989
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at csli.stanford.edu; oe at ifi.uio.no; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++