[developers] potentially important `bug fix' in LKB generator

Fri Feb 15 01:43:01 CET 2008

I am afraid I think there is a flaw here.  I don't think the property
you mention was accidental or for `technical' reasons - when I built
the generator this way, I thought it should be allowable for the
grammar to return a less specified result than that provided to the
generator, because the system constructing an input to the generator
is not supposed to know everything about the grammar.  I'm going to
give a bad example, because it is late and I can't think of the proper
one, but suppose, for instance, the generator input says that we want
to generate from some(x_pl,sheep(x),fall(e_past,x)).  And suppose
the perverse grammar writer has decided to have a lexical entry for
sheep that is underspecified for number.  We still want to generate
something!

I think the real examples would concern tense/aspect rather than
number.  

There's no inconsistency here, in principle, with expecting all EPs in
the input to be realised.  I can explain this in more detail if you
want, but for now would just point out that you can't simply talk
about subsumption of the entire semantics - it's a bag of elementary
predications plus qeqs, not a feature structure.  As far as I remember
offhand, the original definition was in terms of consistency of the
EPs which were in correspondence in input and realisation, not
subsumption between EPs.

I realise that the behaviour of the generator has changed over the
years, but before going any further, I would urge that you write down
formally what you believe the behaviour is / should be.  Then we can
discuss, and maybe parameterise the behaviour.

Ann

> 
> dear all,
> 
> i just made a change in the LKB generator that, in my view, is just a
> bug fix.  but it may cost some grammars generation coverage, hence let
> me elaborate.
> 
> abstractly, the goal of the generator is to enumerate all derivations
> (as licensed by the grammar) such that their semantics is subsumed by
> the input semantics to the generator: within certain limits, we allow
> the generator to return results with a more specific semantics.  this
> is desirable for example where the input is underspecified, e.g. using
> `temp_loc_rel' in the input, even though realizations use prepositions
> whose actual predicates are subsumed by `temp_loc_rel'.  mostly, we do
> not allow realizations whose semantics is less specific than the input,
> i.e. failing to verbalize part of the input semantics.
> 
> in the traditional LKB generator, there is one exception to this rule.
> less specific realizations are returned in case they only lack some of
> input semantics expressed as variable properties.  for example, if the
> input requires an event to be [ SF prop ], but the grammar constructs
> a derivation whose semantics is [ SF prop-or-ques ] (and otherwise is
> subsumed by the input semantics), then that derivation is included as
> part of the generator results.
> 
> i suspect the above used to be the case for technical reasons: during
> lexical lookup, the generator specializes lexical entries as they are 
> activated, i.e. variable properties from input EPs are copied into the
> AVMs of lexical entries (and rules), as activated by those EPs.  this
> specialization prior to chart generation makes things more effient; a
> related effect is that generator derivations look different from parse
> results: their trees do not show applications of lexical rules.  in a
> sense, these derivations have `hidden' daughters, as to record a full
> recipe of rebuilding their tree, lexical rules obviously are required.
> 
> there used to be specialized code for generator edges in various parts
> of the LKB and [incr tsdb()], recovering those hidden daugthers.  for
> example, to compute the MaxEnt score of a generator derivation, these
> daughters contribute to the total score, hence the code needs to treat
> generator derivations different from parser derivations.
> 
> i have long felt irritated with this property of the code (which is all
> my fault in the first place); i know LKB users are often confused about
> the missing nodes in browsing generator trees (there is no special code
> in the tree browser to show the additional daughters).
> 
> now getting to the point: i changed the generator internals to include
> daughters corresponding to lexical rule applications in the usual way
> in the `edge' structure (i.e. the `children' slot is a list of edges).
> this makes the tree display look as expected, and the specialized code
> for generator edges in various places becomes obsolete.
> 
> however, there is an additional benefit to this: when chart packing is
> enabled, the final realization is constructed from re-unifying AVMs of
> lexical entries and rules, as prescribed by the derivation.  in my new
> setup, this includes lexical rules and the original lexical entry, thus
> annuls the intermediate effects of specialization.  so, in our earlier
> example: the semantics on the realization is just [ SF prop-or-ques ],
> as that is (we assume) what the grammar makes it to be.  i know dan at
> least tends to consider such cases bugs in the grammar.  e.g. looking
> at `http://erg.emmtee.net/', try parsing `Kim ate also.'  the result is
> marked `prop-or-ques' (in contrast to, say, `Kim ate.') and generating
> from it yields a number of surprising paraphrases.  conversely, using a
> fully specified input semantics (parse `Kim also ate.' instead), there
> is no generator output `Kim ate also.'  this is the result of the code
> change discussed above: in the new setup, `Kim ate also.' lacks a piece
> of input semantics, viz. the more specific [ SF prop ].
> 
> in conclusion, i present this change as a bug fix because it brings the
> generator in compliance with the abstract definition offered above.  it
> also has the practical value of making the tree display more accessible
> and may help diagnose unwanted underspecification in grammars.  i would
> like to release this change to the LKB sources in the next few weeks.
> 
> the downside is that grammars (sloppily) built to take advantage of the
> traditional LKB behavior could lose some paraphrases in generation.  it
> would be great if those of you actively using the generator could give
> all this some thought, and hopefully arrive at the conclusion that you
> want this improved debugging tool.  currently at least, the full effect
> of the above only kicks in when chart packing is on, but in principle i
> would like to make the new behavior apply in non-packing mode too.
> 
>                                 phew, a long message!  all best  -  oe
> 
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> +++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
> +++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
> +++       --- oe at ifi.uio.no; oe at csli.stanford.edu; stephan at oepen.net ---
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++