[developers] Valid MRS? Bug in ERG?

Stephan Oepen oe at ifi.uio.no
Thu Sep 10 08:45:15 CEST 2020


g'day:

> I think the LKB's EDS code will more aggressively search for a top for the EDS graph during conversion, perhaps looking to the INDEX. If anyone (Stephan?) cares to explain the procedure for selecting tops in less-than-perfect MRSs, I'd be happy to try and implement it in PyDelphin.

yes, robustness to unusual or illformed (as in this case) MRSs has
long been a key goal in the EDS conversion (in the LKB); MRS
infelicities (in ERG parses) were probably more common in 2002 than
today, but still i think that conversion should preferably never fail,
i.e. possibly rather drop information from an illformed MRS than not
yield an EDS at all.

regarding the top node, i do indeed fall back to the INDEX, if need be:

  (let* ((ltop (ed-find-representative eds (psoa-top-h psoa)))
         (index (ed-find-representative eds (psoa-index psoa))))
    (setf (eds-top eds)
      (or (and (ed-p ltop) (ed-id ltop))
          (and (ed-p index) (ed-id index))
          (and (var-p (psoa-index psoa))
               (var-string (psoa-index psoa))))))

the third clause in the or() appears intended to deal with an MRS
whose INDEX is not the intrinsic variable of any EP.  in that case,
the EDS will end up with a top that is not the identifier of any of
its nodes, so effectively no top.

thinking about such corner cases just now, i am tempted to drop that
third fall-back clause and leave the top empty (which would be
formally equivalent, seeing as the top property is interpreted as an
annotation on one of the actual graph nodes).  it appears native
serialization allows for empty top nodes already, in which case there
will be nothing following the opening brace on the first line:

  (format
   stream
   "{~@[~(~a~):~]~
    ~:[~3*~; (~@[cyclic~*~]~@[ ~*~]~@[fragmented~*~])~]~@[~%~]"
   (eds-top object)
   (and *eds-show-status-p* (or cyclicp fragmentedp) )
   cyclicp (and cyclicp fragmentedp) fragmentedp
   (eds-relations object))

while i am sure we have never hit empty tops while working with MRSs
produced by the ERG, the above suggests that (a) identification of the
top node is optional in EDS and (b) native serialization was intended
as a line-oriented format.

mike, may i suggest you add the fall-back, looking for the INDEX, and
otherwise allow EDSs whose top is empty.  regarding the exact
definition of the native EDS serialization, i shall return to that
question in the original thread we had on the topic (one might
disallow whitespace between the opening brace and the optional top, to
try and evade conclusion (b) above).

cheers, oe



More information about the developers mailing list