[developers] EDM implementations

Tue Jan 28 02:57:27 CET 2020

Thanks for the reply, Stephan,

> [...] it appears that both the
> Lisp and Perl implementations of EDM do the same thing, viz. assume
> that there can be at most one constant argument in a relation and
> 'inline' its value (if present) with the predicate itself, e.g.
> internally using node label shorthands like 'named(Abrams)'.  in this
> regard, i suspect bec and you actually may have arrived at the wrong
> conclusion about historic behavior;

Thanks for confirming how the Lisp implementation works. I took your 21.gz
file and created a version that replaced "Abrams" with "Brown", then used
edm_eval.pl to compare; it reports a full match (1.0), so based on this
limited test I think Bec was correct about the Perl version.

> thus, personally, i see no reason
> for pyDelphin to provide a special-cased version of EDM that wholly
> ignores constant arguments.

Me too, and that's not the case. I separated CARGs into their own category
and callers of the script can give the category a weight of zero to ignore
them, which allows them to recreate the results of the Perl implementation.
Otherwise, the default weight for all categories (arguments (-A),
names/predicates (-N), morphosemantic properties (-P), constants (-C), and
tops (-T)) is 1.0.

> i would expect
> your implementation and mtool should then come to the exact same
> results (on EDSs stripped of MRS variable properties, [...])

Yes, but there's no need to strip the properties; just give the category a
weight of zero. I've confirmed on a few test items that my implementation
gets the exact same scores as mtool with -P0.

Furthermore, I think the following option configurations for my
re-implementation cover all current and historical use cases except for the
inlined constants of the Lisp version, which interact with node names in a
way that isn't reproducible with weights alone.

* Perl: `delphin edm -C0 -T0 --ignore-missing=gold`
* Perl with -i option: `delphin edm -C0 -T0 --ignore-missing=both`
* Lisp where *redwooods-score-all-p* is true: `delphin edm`
* Lisp where *redwooods-score-all-p* is false: `delphin edm
--ignore-missing=both`
* mtool (MRP 2019): `delphin edm -P0`
* mtool (MRP 2020? or EDM 2.0): `delphin edm`

On Tue, Jan 28, 2020 at 1:42 AM Stephan Oepen <oe at ifi.uio.no> wrote:

> hi mike,
>
> belatedly, thanks (once again) for pushing forward standardization!
> and also my apologies for returning to this thread a little late!
>
> regarding EDM, i used to think of the Common-Lisp implementation
> (which it appears i produced in early 2012, i.e. more recently than
> the Perl version by bec) as the reference until recently.  last year,
> when comparing its scores to my re-implementation in Python as part of
> mtool, that comparison also turned up the two questions you raised,
> viz. the treatment of the TOP property and how to score parameterized
> predicates.
>
> regarding the first, this appears to be one of the better-kept secrets
> in meaning representation comparison: in my view, it is a semantically
> highly relevant property (marking the contrast between e.g. 'all
> fierce dogs bark' vs. 'all barking dogs are fierce'), but neither the
> original EDM paper nor its derivative in the AMR world (Cai & Knight,
> 2013) discuss it.  yet, both the Lisp implementation of EDM and SMATCH
> seem to always have scored the TOP node as an additional tuple
> (counted among the 'argument' tuples for EDM, while considered among
> the 'attribute' tuples in SMATCH).  the Perl implementation of EDM, on
> the other hand, worked off my 'ltriples' export format for EDS, which
> appears to not include a separate TOP tuple.
>
> i confirmed the nature of those triples by reminding myself of what
> became of the 'export' script mentioned in the original EDM wiki notes
> you had found.  it was folded into the LOGON 'redwoods' script, so
> something like the following actually works today to prepare the input
> for the Perl implementation of EDM:
>
>   $LOGONROOT/redwoods --erg --export ltriples --target /tmp mrs
>
> i attach the output for item #21 from the MRS test suite, for
> reference.  so, i agree with the conclusion bec and you have already
> reached: the original Perl implementation of EDM did not consider TOP
> tuples.  the Lisp implementation, on the other hand, appears to have
> had TOP tuples from its very beginning.
>
> regarding the second design choice you raise, parameterized relations
> (involving one or more constant arguments), it appears that both the
> Lisp and Perl implementations of EDM do the same thing, viz. assume
> that there can be at most one constant argument in a relation and
> 'inline' its value (if present) with the predicate itself, e.g.
> internally using node label shorthands like 'named(Abrams)'.  in this
> regard, i suspect bec and you actually may have arrived at the wrong
> conclusion about historic behavior; thus, personally, i see no reason
> for pyDelphin to provide a special-cased version of EDM that wholly
> ignores constant arguments.
>
> looking at this particular design choice today, however, it seems too
> limiting an assumption and meshing together two things that arguably
> should be considered separate.  even though ERG versions for the past
> 15 or more years have not used predicates with multiple (constant)
> parameters, there would be nothing wrong with representing, say, the
> fraction '2/3' as involving two constant arguments, e.g. something
> like fraction [ CARG1 "2", CARG2 "3" ].  this is, for example, what
> AMR does for complex proper names.
>
> thus, even though our two historic EDM implementations appear to agree
> on the 'inlining' treatment of constant arguments, i would be prepared
> to argue that CARG et al. values should rather be treated as separate
> node properties, i.e. for the above example the 'named' predicate and
> the 'CARG' == 'Abrams' value should be treated as two distinct tuples.
> in part for cross-framework compatibility, this is what we ended up
> doing in mtool, including in its re-implementation of EDM, see:
>
>   http://mrp.nlpl.eu/index.php?page=5
>
> in summary, it sounds as if your EDM re-implementation, mike, had
> arrived at the same conclusions: TOP tuples should be scored, and
> constant arguments considered as separate properties.  i would expect
> your implementation and mtool should then come to the exact same
> results (on EDSs stripped of MRS variable properties, which the
> current mtool EDS reader deliberately discards; see below)?  seeing as
> we have identified two ways in which this way of computing EDM differs
> from the original publication and the two earlier implementations (in
> Perl and Lisp), i would like to suggest we formally coin this
> refinement of the metric EDM 2.0.
>
> regarding how to deal with missing graphs on either the gold or system
> side of the comparison: it appears the Lisp implementation of EDM
> provides a toggle *redwooods-score-all-p*, which selects between two
> modes of computing EDM over two sets of corresponding items, either on
> the intersection of items only; or on their union, treating gaps on
> either side of the comparison as empty graphs (thus, incurring recall
> or precision penalties).  in practice, i believe we used to
> near-exclusively compute EDM over sets of items for which there was
> both a gold and a system graph.  but that can of course only give
> comparable results when fixing that very set of items.  thus, the
> setup of scoring 'all' items seems more general, robust to attempts at
> gaming, and in my view should be considered the default.
>
> finally, regarding variable properties in mtool: for the 2019 CoNLL
> shared task on meaning representation parsing (MRP 2019), we had
> agreed with other framework developers to keep morpho-semantic
> decorations out of the comparison.  hence, the MRP 2019 graphs did not
> include tense, aspect, or number information from the full ERSs.  but
> technically, i would consider that a property of the EDS used in MRP
> 2019, not a design decision in mtool.  for the re-run of the MRP task
> at CoNLL 2020, we are currently preparing to throw these properties
> back into the mix (also in other frameworks, where annotations are
> available), which means the EDS reader in mtool in the near future
> will no longer discard (underlying) variable properties by default.
>
> best wishes, oe
>
>
>
>
>
> On Mon, Jan 20, 2020 at 2:15 AM goodman.m.w at gmail.com
> <goodman.m.w at gmail.com> wrote:
> >
> > Thanks again, Bec.
> >
> > I just want to make sure my implementation gets the same scores for the
> same inputs under the same assumptions as the original implementation. For
> this to work, its behavior concerning the points I've sought clarification
> for should be intentional. In light of your responses, I've separated the
> CARG triples from other properties and have given it its own weight. Thus I
> should be able to get the same scores as your code by setting the weights
> of CARGs (but not properties) and graph-tops to zero. Similarly, I'll add
> an option to ignore missing test items and otherwise treat them as
> mismatches.
> >
> > On Fri, Jan 17, 2020 at 6:14 PM Bec Dridan <bec.dridan at gmail.com> wrote:
> >>
> >>
> >>
> >> On Fri, Jan 17, 2020 at 5:39 PM goodman.m.w at gmail.com <
> goodman.m.w at gmail.com> wrote:
> >>>
> >>>
> >>> One more detail is what to do when the two sides (gold and test) have
> different numbers of items. Currently my code stops as soon as either a
> gold or test item is missing, which is what smatch (the similar metric made
> for AMR) does, but I think that may be wrong because parsing profiles are
> likely to have missing or extra (overgeneration) items in the middle. So
> the question is whether we ignore it or count it as a full mismatch.
> >>
> >>
> >> If you are asking what is 'correct', I guess that depends on why you
> are evaluating. The perl implementation wouldn't have noticed missing gold
> parses, because it used the gold set as the definition of the set. A
> missing test item, on the other hand, by default counts as a full mismatch,
> but there is a command line option to ignore any gold parse with no
> corresponding test parse. The ignore option is useful when the purpose of
> the evaluation is assessing the system you are working on (and you consider
> coverage separately). For comparing across systems, I imagine you probably
> want to count parse failure as a full mismatch. It was useful for me to
> have both options.
> >>
> >> Bec
> >>
> >>>
> >>>
> >>> On Thu, Jan 16, 2020 at 6:33 PM Bec Dridan <bec.dridan at gmail.com>
> wrote:
> >>>>
> >>>> Wow, that is some old code... From memory, export was a wrapper
> around `parse --export`, where I could add :ltriples to the
> tsdb::*redwoods-export-values* set.
> >>>>
> >>>> I don't know the mtool code at all, but re-reading the paper and
> looking at the perl code, I don't think the original implementation
> evaluated CARG at all. We only checked that the correct character span had
> a pred name of`named`.
> >>>>
> >>>> I think you are right that the triple export at the time did not
> produce a triple for TOP and it hence would not have been counted.
> >>>>
> >>>> That match your memory Stephan?
> >>>>
> >>>> Bec
> >>>>
> >>>>
> >>>> On Thu, Jan 16, 2020 at 8:34 PM goodman.m.w at gmail.com <
> goodman.m.w at gmail.com> wrote:
> >>>>>
> >>>>> Hello developers,
> >>>>>
> >>>>> Recently I wanted to try out Elementary Dependency Match (EDM) but I
> did not find an easy way to do it. I saw lisp code in the LKB's repository
> and Bec's Perl code, but I'm not sure how to call the former from the
> command line and the latter seems outdated (I don't see the "export"
> command required by its instructions).
> >>>>>
> >>>>> The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd
> implement it on top of PyDelphin. The result is here:
> https://github.com/delph-in/delphin.edm. It requires the latest version
> of PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it reads text
> files or [incr tsdb()] profiles.
> >>>>>
> >>>>> When I nearly had my version working I found that Stephan et al.'s
> mtool (https://github.com/cfmrpThe paper example
> >>>>> /mtool) also had an implementation of EDM, so I used that to compare
> with my outputs (as I couldn't get the previous implementations to work).
> In this process I think I found some differences from Dridan & Oepen,
> 2011's description, and this email is to confirm those findings. Namely,
> that mtool's (and now my) implementation do the following:
> >>>>>
> >>>>> * CARGs are treated as property triples ("class 3 information").
> Previously they were combined with the predicate name. This change means
> that predicates like 'named' will match even if their CARGs don't and the
> CARGs are a separate thing that needs to be matched.
> >>>>>
> >>>>> * The identification of the graph's TOP counts as a triple.
> >>>>>
> >>>>> One difference between mtool and delphin.edm is that mtool does not
> count "variable" properties from EDS, but that's just because its EDS
> parser does not yet handle them while PyDelphin's does.
> >>>>>
> >>>>> Can anyone familiar with EDM confirm the above? Or can anyone
> explain how to call the Perl or LKB code so I can compare?
> >>>>>
> >>>>> --
> >>>>> -Michael Wayne Goodman
> >>>
> >>>
> >>>
> >>> --
> >>> -Michael Wayne Goodman
> >
> >
> >
> > --
> > -Michael Wayne Goodman
>

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20200128/55580cd3/attachment-0001.html>