[developers] tsdb result file

Sat Feb 7 23:15:55 CET 2009

hi bart (and francis),

> as an addition to a mail I sent earlier (see below), I also noticed
> that in my grammar, sometimes the root node shows up twice.

  [...]

> I was looking at the result file in tsdb profiles, and I saw that in
> some parses the root condition is included in the parse tree, and in
> some parses it isn't.

  [...]

not quite sure why, but i was thinking about this problem in the shower
this morning :-).  i suspect both observations could be the result of a
minor bug in the unpacking code coupled with a small bug in the printer
for derivations.  even though you noticed those missing and double root
nodes in [incr tsdb()] derivations, i would expect the same behavior in
interactive mode, when using `-verbose=2' to report derivations.

my suspicion is that these problems occur whenever an edge is a result
(i.e. licensed by a root node) and also the daughter of another edge, 
i.e. a larger result (which will have to be by virtue of unary rules).
at least the `gen-mod' case you emailed seems to fit that description.

for these configurations, the derivation printer needs to test whether
it is looking at the top node of a derivation, and only report the root
node where that is the case.  missing roots could then be the result of
an edge first being unpacked as a daughter (no root), and when the same
edge is later unpacked as an independent result, the unpacker may fail
to record the root condition on it.

i have not checked the code, but this seems a plausible hypothesis.  it
would be helpful to have (preferably simple) test cases for this issue.
could you email specifics about the grammar you use (if need be, point
us to a copy somewhere), inputs that expose the problem, and the exact
incantation applied in calling the parser?

> Is this difference significant in some way?

i would say, yes, in that it shows a bug in PET.  [incr tsdb()] may be
sufficiently robust to work around both issues, i.e. i expect it should
be possible to re-construct these derivations and even train ME models.
but this is primarily because parse selection makes no use of the root
nodes currently, something we hope to improve later this year.  thus, i
would like to try and help to iron out these inconsistencies!

                                                      all best  -  oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at ifi.uio.no; oe at csli.stanford.edu; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++