[developers] generation bug in Agree - "are not permitted to look you."

Spencer Rarrick spencer.rarrick at gmail.com
Fri Nov 29 23:37:20 CET 2013

Glenn and I have identified a bug in generation in Agree that seems to
result from of combination of unusual circumstances. When generating from a
parse of "You are not permitted to look.", we generate a large number of
erroneous sentences similar to "are not permitted to look you.", "'re not
permitted to look you", etc.

Steps we have identified that allow these realizations include:

1. The input mrs contains an EP with "_look_v_1_rel" which has arguments
LBL, ARG0. ARG1.  ARG1 is coreferenced to HOOK.XARG. The "correct" LE to
look up would be "look_v1" which has an EP with that same PRED and argument
signature. However there is also an LE "look_v4" that has an EP with
"_look_v_1_rel", and an additional ARG2 argument position. Despite the
extra argument position, however, the EP successfully unifies with the EP
in the MRS so we add this LE to the chart. LBL, ARG0, and ARG1 end up
skolemized because there were skolems in those positions in the input MRS
EP, but ARG2 remains unskolemized because the input MRS contained no
information with which to specialize it.

2. The chart will have an edge for "you" added because of a "pron_rel" and
"pronoun_q_rel" with "ARG0 [ PNG [ PN 2 ] ]."  This ends up skolemized to
the same skolem constant as HOOK.XARG and ARG1 of the "_look_v_1_rel" EP.
Generally, this skolemization should make sure that it can only be used
where appropriate/intended, but because ARG2 on the EP in "look_v4" is not
skolemized, it can combine with an edge with any skolem. In fact, we see
the edge for "you" combine with "look_v4" to form "look you", which
ultimately appears in several root realizations.

3. If accept fragment root symbols, we get the aforementioned VP-fragment
root realizations such as "are not permitted to look you."  Each EP is
accounted for, as the EP's that should have been used in the subject of the
sentence are instead used in an argument of "look."  Strangely, we end up
with two variables (sets of equivalence classes) that have the same
skolem.  ARG1 of "look", ARG2 of "permit", ARG2 or "parg_d_rel", and
HOOK.XARG share the same variable which is not ARG0 of any EP, while ARG0
of pron_rel and pronoun_q_rel have the same skolem but are not coreferenced
(except to each other).

Clearly we are missing a constraint in one or more parts of our generation
pipeline. There are fixes we have thought of, but we are not sure if they
would have unintended consequences and possible block some valid
realizations in other circumstances:

1. Final subsumption check.  We are not currently performing this, and it
should in principle rule out these realizations as there are coreferences
in the input MRS that are not present in the MRS of these generated trees.
However, ideally we would like to avoid generating all of these edges
rather than simply rejecting the derivation at such a late step.

2. During PRED lookup we could require that the number and names of
arguments in candidate LEs matches exactly with those in the input MRS EPs.
In this case, we would not even add "look_v4" to the chart in the first
place. However, could there be cases where this sort of underspecification
in number of arguments be valid and meaningful?

3. We could generate and assign skolems for argument positions not
constrained by the input MRS (e.g. ARG2 on "look") This would prevent that
edge from combining with "you" as their skolems would not unify. This seems
potentially safer than what is suggested in (2), as it does not completely
rule out the use of such LEs, but merely prevents them from using edges
that should be reserved for other parts of the tree. I can imagine some
control structure rule that might want to legitimately coreference that
argument to some other part of the tree, and this would be disallowed by
such skolemization, but perhaps this simply doesn't occur?

Anyway, thanks to anyone who has made it through this incredibly
long-winded email. If you have ideas about what the correct way to fix this
bug, suggestions would be greatly appreciated.

