[developers] Dropped arguments in DMRS
aac10 at cl.cam.ac.uk
Mon Jan 11 22:21:49 CET 2016
On 11/01/2016 16:41, Stephan Oepen wrote:
> if we were to add code to synthesize nodes for variables that are not
> introduced as the distinguished (or characteristic) variable of any EP
> but occur at least twice in an MRS, it would seem natural to me to
> leave these nodes unlabeled (they will not have characterization or
> other surface links either). this would also indicate that they have a
> somewhat different status formally (at least in terms of
> correspondences to a full MRS).
we can't really leave them unlabelled in DMRS because they wouldn't show
up very well ... From my perspective, the alternatives to changing the
DMRS code to allow them are 1. put zero pronouns back into JACY (without
quantifiers) 2. add zero pronouns in some sort of post-processing step
3. add them to DMRS. Mike mentioned ICONS - but it doesn't help here
because the nodes that would have to be equated don't exist. Or, at
least, since ICONS can do absolutely anything, one could define a
variant of ICONS which did encode it properly, but it's really too much
of a stretch.
> i share your sentiment that the disappearing of unexpressed arguments
> in our dependency graphs in general reduces clutter and is desirable.
> co-indexation of such (unexpressed) variables admittedly challenges
> that position. if we end up special-coding for these cases, it would
> be good to have the motivating examples and analyses readily available
> (and publicly vetted). i believe emily may have been the first to
> argue for such co-indexation, probably from her work in the
the example Mike gave was tabe-sugiru = eat-exceed = overeat
while it would be very good if someone could write this up or point us
to a proper write up, I don't think there's much room for argument about
needing it for Japanese, unless one uses zero pronouns
> —i recall we have talked once or twice in the past about adding an
> explicit distinction of unexpressed variables. for the ERG at least,
> i believe dan (and others) often look at ‘u’ and ‘i’ (and maybe ‘p’)
> as varible sorts that indicate unexpressed arguments. but that is at
> best a convention and prevents stronger typing of argument slots as
> would be desirable. for example, the ARG2 of _eat_v_1 presumably must
> always be an ‘x’ when expressed, but dan abstains from putting that
> type into the lexical entry because the scoping machinery would
> complain at ‘x’-typed variable without a quantifier.
> would it work (and be desirable) to introduce a variable property, say
> [ XP bool ], to distinguish expressed from unexpressed roles? i
> imagine it would not be hard to make all constructions that bind roles
> specialize XP to true; one could then use the VPM machinery upon MRS
> read-out to default remaining (unspecific) XP values to false.
> alternatively, i imagine one could obtain the same effect by making
> the hierarchy of variable types a little richer, i.e. put something
> above at least ‘x’ and ‘e’ to indicate unxpressed variants, say ‘w’
> and ‘d’ (the immediately preceding letters :-).
> any thoughts on actually introducing such an explicit marking of
> unexpressed arguments?
> all best, oe
The problem with handling unexpressed arguments `properly' is that there
are multiple different classes of unexpressed arguments, as I outlined.
In some cases in the ERG, verbs with optional arguments have unexpressed
arguments in the semantics, while other cases don't. This also
interacts with the desire to save on predicate names that has caused
many predicates to appear with different arities (which, of course,
isn't OK if one translates directly to a conventional logical
representation and has to be interpreted as some sort of notational
e.g., the ERG demo gives:
Kim understood understand_v_by (e, x, p)
Kim understood the sentence / Sandy understand_v_by (e, x, x')
Kim understood that Sandy was scared understand_v_by (e, x, h, i)
Kim ran run_v_1 (e, x)
Kim ran the race / the store run_v_1 (e, x, x')
Kim hoped hope_v_1 (e, x)
Kim dreamed dream_v_1 (e, x, p)
I don't find this intuitive but we don't have a test set or criteria for
*MRS which would make it clear why one representation is to be preferred
over another, and I find it hard to imagine what such criteria could
be. That's why I talked about anaphora in the previous message, since
that could have been an example of a clear cut difference, though it
seems (to me) it probably isn't. Failing such criteria, I don't want to
argue that there's a problem with the ERG representations but it also
means that dropping them gives one less thing to worry about when we're
actually using the output.
So - I don't think it'll be a big hassle to add them to the DMRS code,
but I don't propose to add them to the DMRS formal description and I
don't think it's worthwhile expending energy on trying to clean this
up. There are ways to allow argument slot typing without messing up the
scope machinery, if that's something that needs to be fixed.
More information about the developers