<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Thanks for the reply, Stephan,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">> [...] it appears that both the<br>> Lisp and Perl implementations of EDM do the same thing, viz. assume<br>> that there can be at most one constant argument in a relation and<br>> 'inline' its value (if present) with the predicate itself, e.g.<br>> internally using node label shorthands like 'named(Abrams)'. in this<br>> regard, i suspect bec and you actually may have arrived at the wrong<br>> conclusion about historic behavior;</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Thanks for confirming how the Lisp implementation works. I took your 21.gz file and created a version that replaced "Abrams" with "Brown", then used <a href="http://edm_eval.pl">edm_eval.pl</a> to compare; it reports a full match (1.0), so based on this limited test I think Bec was correct about the Perl version.<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">> thus, personally, i see no reason<br>> for pyDelphin to provide a special-cased version of EDM that wholly<br>> ignores constant arguments.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Me too, and that's not the case. I separated CARGs into their own category and callers of the script can give the category a weight of zero to ignore them, which allows them to recreate the results of the Perl implementation. Otherwise, the default weight for all categories (arguments (-A), names/predicates (-N), morphosemantic properties (-P), constants (-C), and tops (-T)) is 1.0.<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">> i would expect<br>> your implementation and mtool should then come to the exact same<br>> results (on EDSs stripped of MRS variable properties, [...])</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Yes, but there's no need to strip the properties; just give the category a weight of zero. I've confirmed on a few test items that my implementation gets the exact same scores as mtool with -P0.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Furthermore, I think the following option configurations for my re-implementation cover all current and historical use cases except for the inlined constants of the Lisp version, which interact with node names in a way that isn't reproducible with weights alone.<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">* Perl: `delphin edm -C0 -T0 --ignore-missing=gold`</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">* Perl with -i option: `delphin edm -C0 -T0 --ignore-missing=both`<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">* Lisp where *redwooods-score-all-p* is true: `delphin edm`</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">* Lisp where *redwooods-score-all-p* is false: `delphin edm --ignore-missing=both`</div></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">* mtool (MRP 2019): `delphin edm -P0`</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">* mtool (MRP 2020? or EDM 2.0): `delphin edm`<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jan 28, 2020 at 1:42 AM Stephan Oepen <<a href="mailto:oe@ifi.uio.no">oe@ifi.uio.no</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">hi mike,<br> <br> belatedly, thanks (once again) for pushing forward standardization!<br> and also my apologies for returning to this thread a little late!<br> <br> regarding EDM, i used to think of the Common-Lisp implementation<br> (which it appears i produced in early 2012, i.e. more recently than<br> the Perl version by bec) as the reference until recently. last year,<br> when comparing its scores to my re-implementation in Python as part of<br> mtool, that comparison also turned up the two questions you raised,<br> viz. the treatment of the TOP property and how to score parameterized<br> predicates.<br> <br> regarding the first, this appears to be one of the better-kept secrets<br> in meaning representation comparison: in my view, it is a semantically<br> highly relevant property (marking the contrast between e.g. 'all<br> fierce dogs bark' vs. 'all barking dogs are fierce'), but neither the<br> original EDM paper nor its derivative in the AMR world (Cai & Knight,<br> 2013) discuss it. yet, both the Lisp implementation of EDM and SMATCH<br> seem to always have scored the TOP node as an additional tuple<br> (counted among the 'argument' tuples for EDM, while considered among<br> the 'attribute' tuples in SMATCH). the Perl implementation of EDM, on<br> the other hand, worked off my 'ltriples' export format for EDS, which<br> appears to not include a separate TOP tuple.<br> <br> i confirmed the nature of those triples by reminding myself of what<br> became of the 'export' script mentioned in the original EDM wiki notes<br> you had found. it was folded into the LOGON 'redwoods' script, so<br> something like the following actually works today to prepare the input<br> for the Perl implementation of EDM:<br> <br> $LOGONROOT/redwoods --erg --export ltriples --target /tmp mrs<br> <br> i attach the output for item #21 from the MRS test suite, for<br> reference. so, i agree with the conclusion bec and you have already<br> reached: the original Perl implementation of EDM did not consider TOP<br> tuples. the Lisp implementation, on the other hand, appears to have<br> had TOP tuples from its very beginning.<br> <br> regarding the second design choice you raise, parameterized relations<br> (involving one or more constant arguments), it appears that both the<br> Lisp and Perl implementations of EDM do the same thing, viz. assume<br> that there can be at most one constant argument in a relation and<br> 'inline' its value (if present) with the predicate itself, e.g.<br> internally using node label shorthands like 'named(Abrams)'. in this<br> regard, i suspect bec and you actually may have arrived at the wrong<br> conclusion about historic behavior; thus, personally, i see no reason<br> for pyDelphin to provide a special-cased version of EDM that wholly<br> ignores constant arguments.<br> <br> looking at this particular design choice today, however, it seems too<br> limiting an assumption and meshing together two things that arguably<br> should be considered separate. even though ERG versions for the past<br> 15 or more years have not used predicates with multiple (constant)<br> parameters, there would be nothing wrong with representing, say, the<br> fraction '2/3' as involving two constant arguments, e.g. something<br> like fraction [ CARG1 "2", CARG2 "3" ]. this is, for example, what<br> AMR does for complex proper names.<br> <br> thus, even though our two historic EDM implementations appear to agree<br> on the 'inlining' treatment of constant arguments, i would be prepared<br> to argue that CARG et al. values should rather be treated as separate<br> node properties, i.e. for the above example the 'named' predicate and<br> the 'CARG' == 'Abrams' value should be treated as two distinct tuples.<br> in part for cross-framework compatibility, this is what we ended up<br> doing in mtool, including in its re-implementation of EDM, see:<br> <br> <a href="http://mrp.nlpl.eu/index.php?page=5" rel="noreferrer" target="_blank">http://mrp.nlpl.eu/index.php?page=5</a><br> <br> in summary, it sounds as if your EDM re-implementation, mike, had<br> arrived at the same conclusions: TOP tuples should be scored, and<br> constant arguments considered as separate properties. i would expect<br> your implementation and mtool should then come to the exact same<br> results (on EDSs stripped of MRS variable properties, which the<br> current mtool EDS reader deliberately discards; see below)? seeing as<br> we have identified two ways in which this way of computing EDM differs<br> from the original publication and the two earlier implementations (in<br> Perl and Lisp), i would like to suggest we formally coin this<br> refinement of the metric EDM 2.0.<br> <br> regarding how to deal with missing graphs on either the gold or system<br> side of the comparison: it appears the Lisp implementation of EDM<br> provides a toggle *redwooods-score-all-p*, which selects between two<br> modes of computing EDM over two sets of corresponding items, either on<br> the intersection of items only; or on their union, treating gaps on<br> either side of the comparison as empty graphs (thus, incurring recall<br> or precision penalties). in practice, i believe we used to<br> near-exclusively compute EDM over sets of items for which there was<br> both a gold and a system graph. but that can of course only give<br> comparable results when fixing that very set of items. thus, the<br> setup of scoring 'all' items seems more general, robust to attempts at<br> gaming, and in my view should be considered the default.<br> <br> finally, regarding variable properties in mtool: for the 2019 CoNLL<br> shared task on meaning representation parsing (MRP 2019), we had<br> agreed with other framework developers to keep morpho-semantic<br> decorations out of the comparison. hence, the MRP 2019 graphs did not<br> include tense, aspect, or number information from the full ERSs. but<br> technically, i would consider that a property of the EDS used in MRP<br> 2019, not a design decision in mtool. for the re-run of the MRP task<br> at CoNLL 2020, we are currently preparing to throw these properties<br> back into the mix (also in other frameworks, where annotations are<br> available), which means the EDS reader in mtool in the near future<br> will no longer discard (underlying) variable properties by default.<br> <br> best wishes, oe<br> <br> <br> <br> <br> <br> On Mon, Jan 20, 2020 at 2:15 AM <a href="mailto:goodman.m.w@gmail.com" target="_blank">goodman.m.w@gmail.com</a><br> <<a href="mailto:goodman.m.w@gmail.com" target="_blank">goodman.m.w@gmail.com</a>> wrote:<br> ><br> > Thanks again, Bec.<br> ><br> > I just want to make sure my implementation gets the same scores for the same inputs under the same assumptions as the original implementation. For this to work, its behavior concerning the points I've sought clarification for should be intentional. In light of your responses, I've separated the CARG triples from other properties and have given it its own weight. Thus I should be able to get the same scores as your code by setting the weights of CARGs (but not properties) and graph-tops to zero. Similarly, I'll add an option to ignore missing test items and otherwise treat them as mismatches.<br> ><br> > On Fri, Jan 17, 2020 at 6:14 PM Bec Dridan <<a href="mailto:bec.dridan@gmail.com" target="_blank">bec.dridan@gmail.com</a>> wrote:<br> >><br> >><br> >><br> >> On Fri, Jan 17, 2020 at 5:39 PM <a href="mailto:goodman.m.w@gmail.com" target="_blank">goodman.m.w@gmail.com</a> <<a href="mailto:goodman.m.w@gmail.com" target="_blank">goodman.m.w@gmail.com</a>> wrote:<br> >>><br> >>><br> >>> One more detail is what to do when the two sides (gold and test) have different numbers of items. Currently my code stops as soon as either a gold or test item is missing, which is what smatch (the similar metric made for AMR) does, but I think that may be wrong because parsing profiles are likely to have missing or extra (overgeneration) items in the middle. So the question is whether we ignore it or count it as a full mismatch.<br> >><br> >><br> >> If you are asking what is 'correct', I guess that depends on why you are evaluating. The perl implementation wouldn't have noticed missing gold parses, because it used the gold set as the definition of the set. A missing test item, on the other hand, by default counts as a full mismatch, but there is a command line option to ignore any gold parse with no corresponding test parse. The ignore option is useful when the purpose of the evaluation is assessing the system you are working on (and you consider coverage separately). For comparing across systems, I imagine you probably want to count parse failure as a full mismatch. It was useful for me to have both options.<br> >><br> >> Bec<br> >><br> >>><br> >>><br> >>> On Thu, Jan 16, 2020 at 6:33 PM Bec Dridan <<a href="mailto:bec.dridan@gmail.com" target="_blank">bec.dridan@gmail.com</a>> wrote:<br> >>>><br> >>>> Wow, that is some old code... From memory, export was a wrapper around `parse --export`, where I could add :ltriples to the tsdb::*redwoods-export-values* set.<br> >>>><br> >>>> I don't know the mtool code at all, but re-reading the paper and looking at the perl code, I don't think the original implementation evaluated CARG at all. We only checked that the correct character span had a pred name of`named`.<br> >>>><br> >>>> I think you are right that the triple export at the time did not produce a triple for TOP and it hence would not have been counted.<br> >>>><br> >>>> That match your memory Stephan?<br> >>>><br> >>>> Bec<br> >>>><br> >>>><br> >>>> On Thu, Jan 16, 2020 at 8:34 PM <a href="mailto:goodman.m.w@gmail.com" target="_blank">goodman.m.w@gmail.com</a> <<a href="mailto:goodman.m.w@gmail.com" target="_blank">goodman.m.w@gmail.com</a>> wrote:<br> >>>>><br> >>>>> Hello developers,<br> >>>>><br> >>>>> Recently I wanted to try out Elementary Dependency Match (EDM) but I did not find an easy way to do it. I saw lisp code in the LKB's repository and Bec's Perl code, but I'm not sure how to call the former from the command line and the latter seems outdated (I don't see the "export" command required by its instructions).<br> >>>>><br> >>>>> The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd implement it on top of PyDelphin. The result is here: <a href="https://github.com/delph-in/delphin.edm" rel="noreferrer" target="_blank">https://github.com/delph-in/delphin.edm</a>. It requires the latest version of PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it reads text files or [incr tsdb()] profiles.<br> >>>>><br> >>>>> When I nearly had my version working I found that Stephan et al.'s mtool (<a href="https://github.com/cfmrpThe" rel="noreferrer" target="_blank">https://github.com/cfmrpThe</a> paper example<br> >>>>> /mtool) also had an implementation of EDM, so I used that to compare with my outputs (as I couldn't get the previous implementations to work). In this process I think I found some differences from Dridan & Oepen, 2011's description, and this email is to confirm those findings. Namely, that mtool's (and now my) implementation do the following:<br> >>>>><br> >>>>> * CARGs are treated as property triples ("class 3 information"). Previously they were combined with the predicate name. This change means that predicates like 'named' will match even if their CARGs don't and the CARGs are a separate thing that needs to be matched.<br> >>>>><br> >>>>> * The identification of the graph's TOP counts as a triple.<br> >>>>><br> >>>>> One difference between mtool and delphin.edm is that mtool does not count "variable" properties from EDS, but that's just because its EDS parser does not yet handle them while PyDelphin's does.<br> >>>>><br> >>>>> Can anyone familiar with EDM confirm the above? Or can anyone explain how to call the Perl or LKB code so I can compare?<br> >>>>><br> >>>>> --<br> >>>>> -Michael Wayne Goodman<br> >>><br> >>><br> >>><br> >>> --<br> >>> -Michael Wayne Goodman<br> ><br> ><br> ><br> > --<br> > -Michael Wayne Goodman<br> </blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature">-Michael Wayne Goodman</div>