[developers] Additions to the Simple MRS format

Mon Aug 18 07:42:02 CEST 2014

Hello all,

It was noted in Tomar that the Simple MRS format lacked some
attributes that are present in the XML format, and that these
attributes can be useful for users of MRS. The attributes are:

  * A Lnk value (e.g. <cfrom:cto>) for the whole MRS
  * "surface" on the top level of the MRS
  * "surface" on the EPs

Stephan, Ann, Glenn, Woodley, and myself---developers of software that
produce Simple MRS (forgive me if I've left someone out)---have
discussed how to add these to the format, and we have come up with a
way to represent them matching the aesthetics of the original format
and, more importantly, maintaining backwards compatibility (by making
the additions optional and by not outputting them if the data is not
specified).

We also agreed to make a (long overdue) change so that "LTOP" becomes
"TOP", since in full utterances the thing currently called LTOP is in
fact TOP (i.e. a global top, rather than local; this is further
discussed at the bottom of this email).

Finally, we agreed to assign a version number to this updated format
(e.g. v1.1, where the currently used format is v1.0), so that
processors can, in theory, input and output MRSs compliant with either
format.

While the implementation details were discussed off-list, we want to
bring the discussion to developers at delph-in.net (as we agreed to do in
Tomar), so that others have a chance to see and comment on the
proposal.

Here is an example MRS in the new format:

[ <0:41> "I am sure I shall say nothing of the kind."
  TOP: h0
  INDEX: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ]
  RELS: < [ pron_rel<0:1> "I" LBL: h4 ARG0: x3 [ x PERS: 1 NUM: sg
PRONTYPE: std_pron ] ]
          [ pronoun_q_rel<0:1> LBL: h5 ARG0: x3 RSTR: h6 BODY: h7 ]
          [ "_sure_a_of_rel"<5:9> "sure" LBL: h1 ARG0: e2 ARG1: x3 ARG2: h8 ]
          [ pron_rel<15:16> "I" LBL: h9 ARG0: x10 [ x PERS: 1 NUM: sg
PRONTYPE: std_pron ] ]
          [ pronoun_q_rel<15:16> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ]
          [ "_say_v_1_rel"<23:26> "say" LBL: h14 ARG0: e15 [ e SF:
prop TENSE: fut MOOD: indicative PROG: - PERF: - ] ARG1: x10 ARG2: x16
[ x PERS: 3 NUM: sg ] ]
          [ thing_rel<27:34> "nothing" LBL: h17 ARG0: x16 ]
          [ _no_q_rel<27:34> "nothing" LBL: h18 ARG0: x16 RSTR: h19 BODY: h20 ]
          [ _of_p_rel<35:37> "of" LBL: h17 ARG0: e21 [ e SF: prop ]
ARG1: x16 ARG2: x22 [ x PERS: 3 NUM: sg IND: + ] ]
          [ _the_q_rel<38:41> "the" LBL: h23 ARG0: x22 RSTR: h24 BODY: h25 ]
          [ "_kind_n_of-n_rel"<42:47> "kind" LBL: h26 ARG0: x22 ARG1: i27 ] >
  HCONS: < h0 qeq h1 h6 qeq h4 h8 qeq h14 h12 qeq h9 h19 qeq h17 h24 qeq h26 > ]

(I made up the surface values for illustration, so in practice they
may differ, but the formatting will remain the same.)

We also want to hear how your grammars deal with the TOP variable. In
general, I think, the (actual) LTOP is equated with the top handle of
a local structure, but when a full utterance is produced, the TOP (or
GTOP, in Matrix-derived grammars) is QEQ'd to the handle (i.e. GTOP
qeq LTOP), but this might not be true for all grammars. In particular,
we'd like to know if it's ever necessary to have both TOP *and* LTOP
representable in an MRS.

Thanks!

-- 
-Michael Wayne Goodman