hi mike,

belatedly, thanks (once again) for pushing forward standardization!
and also my apologies for returning to this thread a little late!

regarding EDM, i used to think of the Common-Lisp implementation
(which it appears i produced in early 2012, i.e. more recently than
the Perl version by bec) as the reference until recently.  last year,
when comparing its scores to my re-implementation in Python as part of
mtool, that comparison also turned up the two questions you raised,
viz. the treatment of the TOP property and how to score parameterized

regarding the first, this appears to be one of the better-kept secrets
in meaning representation comparison: in my view, it is a semantically
highly relevant property (marking the contrast between e.g. 'all
fierce dogs bark' vs. 'all barking dogs are fierce'), but neither the
original EDM paper nor its derivative in the AMR world (Cai & Knight,
2013) discuss it.  yet, both the Lisp implementation of EDM and SMATCH
seem to always have scored the TOP node as an additional tuple
(counted among the 'argument' tuples for EDM, while considered among
the 'attribute' tuples in SMATCH).  the Perl implementation of EDM, on
the other hand, worked off my 'ltriples' export format for EDS, which
appears to not include a separate TOP tuple.

i confirmed the nature of those triples by reminding myself of what
became of the 'export' script mentioned in the original EDM wiki notes
you had found.  it was folded into the LOGON 'redwoods' script, so
something like the following actually works today to prepare the input
for the Perl implementation of EDM:

  $LOGONROOT/redwoods --erg --export ltriples --target /tmp mrs

i attach the output for item #21 from the MRS test suite, for
reference.  so, i agree with the conclusion bec and you have already
reached: the original Perl implementation of EDM did not consider TOP
tuples.  the Lisp implementation, on the other hand, appears to have
had TOP tuples from its very beginning.

regarding the second design choice you raise, parameterized relations
(involving one or more constant arguments), it appears that both the
Lisp and Perl implementations of EDM do the same thing, viz. assume
that there can be at most one constant argument in a relation and
'inline' its value (if present) with the predicate itself, e.g.
internally using node label shorthands like 'named(Abrams)'.  in this
regard, i suspect bec and you actually may have arrived at the wrong
conclusion about historic behavior; thus, personally, i see no reason
for pyDelphin to provide a special-cased version of EDM that wholly
ignores constant arguments.

looking at this particular design choice today, however, it seems too
limiting an assumption and meshing together two things that arguably
should be considered separate.  even though ERG versions for the past
15 or more years have not used predicates with multiple (constant)
parameters, there would be nothing wrong with representing, say, the
fraction '2/3' as involving two constant arguments, e.g. something
like fraction [ CARG1 "2", CARG2 "3" ].  this is, for example, what
AMR does for complex proper names.

thus, even though our two historic EDM implementations appear to agree
on the 'inlining' treatment of constant arguments, i would be prepared
to argue that CARG et al. values should rather be treated as separate
node properties, i.e. for the above example the 'named' predicate and
the 'CARG' == 'Abrams' value should be treated as two distinct tuples.
in part for cross-framework compatibility, this is what we ended up
doing in mtool, including in its re-implementation of EDM, see:


in summary, it sounds as if your EDM re-implementation, mike, had
arrived at the same conclusions: TOP tuples should be scored, and
constant arguments considered as separate properties.  i would expect
your implementation and mtool should then come to the exact same
results (on EDSs stripped of MRS variable properties, which the
current mtool EDS reader deliberately discards; see below)?  seeing as
we have identified two ways in which this way of computing EDM differs
from the original publication and the two earlier implementations (in
Perl and Lisp), i would like to suggest we formally coin this
refinement of the metric EDM 2.0.

regarding how to deal with missing graphs on either the gold or system
side of the comparison: it appears the Lisp implementation of EDM
provides a toggle *redwooods-score-all-p*, which selects between two
modes of computing EDM over two sets of corresponding items, either on
the intersection of items only; or on their union, treating gaps on
either side of the comparison as empty graphs (thus, incurring recall
or precision penalties).  in practice, i believe we used to
near-exclusively compute EDM over sets of items for which there was
both a gold and a system graph.  but that can of course only give
comparable results when fixing that very set of items.  thus, the
setup of scoring 'all' items seems more general, robust to attempts at
gaming, and in my view should be considered the default.

finally, regarding variable properties in mtool: for the 2019 CoNLL
shared task on meaning representation parsing (MRP 2019), we had
agreed with other framework developers to keep morpho-semantic
decorations out of the comparison.  hence, the MRP 2019 graphs did not
include tense, aspect, or number information from the full ERSs.  but
technically, i would consider that a property of the EDS used in MRP
2019, not a design decision in mtool.  for the re-run of the MRP task
at CoNLL 2020, we are currently preparing to throw these properties
back into the mix (also in other frameworks, where annotations are
available), which means the EDS reader in mtool in the near future
will no longer discard (underlying) variable properties by default.

best wishes, oe

