[developers] [sdp-organizers] From EDS/RMS to DM
oe at ifi.uio.no
Sat Feb 9 20:11:40 CET 2019
thanks for your interest in the DM bi-lexical dependencies. i am
copying the DELPH-IN ‘developers’ list, just in case others might be
interested in this topic too (or even just to archive this overdue
summary of the current state of affairs :-).
> We recently read the 'Who Did What to Whom? A Contrastive Study of Syntacto-Semantic Dependencies' paper, and we were wondering how to get DM output from the EDS/MRS or other semantic structure produced by ERG. I can get MRS and EDS output using the redwoods script in the LOGON tree like so:
> ./redwoods --binary --erg --default --composite --target /tmp export mrs,eds --active all erg/1214/input/16-05-23/pet
> but redwoods doesn’t seem to support the DM format, nor do I see it in the [incr tsdb()]documentation.
yes, conversion (from MRS, via EDS) to DM is not yet tightly
integrated with the [incr tsdb()] export functionality. the converter
was developed by angelina ivanova in a slightly weird lisp dialect
(where whitespace carries meaning), and it operates as a
post-processor on [incr tsdb()] export files.
assuming you have a functional LOGON tree (as it seems you do :-), the
following should work:
$LOGONROOT/redwoods --erg --target /tmp --export input,derivation,mrs,eds mrs
$LOGONROOT/bin/dtm --tok ptb --data /tmp/mrs --grammar
$LOGONROOT/lingo/erg --dtm /tmp
the output will be a file ‘/tmp/mrs.ptb.dtm’ in what angelina called
the DTM format; it provides both the DT (syntactic) and DM (semantic)
dependencies. a description of the various fields is here, though i
vaguely recall there may have been minor revisions after that report
the SDP file format (which generalizes over a range of different
bi-lexical semantic dependencies) was later defined for the SemEval
2014 and 2015 tasks on semantic dependency parsing; see
‘http://sdp.delph-in.net/’. conversion from DTM to SDP involves (a)
dropping the DT columns; adding (b) lemmas (from the ERG lexicon), (c)
parts of speech (from TnT), and (d) frame identifiers (from the ERG
SEM-I); (e) simplifying a few dependency relations (e.g.
‘compound_name’ to ‘compound’, ‘unspec_manner’ to ‘manner’, and such);
and (f) patching up the internal structure of contracted negations
when outputting PTB tokenization.
conversion from DTM to SDP is, again, part of [incr tsdb()]. the
current implementation of that last step is somewhat ERG-specific (and
has only been systematically validated for the 2014 release). at
present, i am afraid, there is no general user interface to this code.
it resides in ‘dm.lisp’ (in [incr tsdb()]), where i just now committed
the function dtm-to-dm(), which i had used for SemEval 2014 and 2015.
if you were interested in a more robust way of generating DM from your
own parses or treebanks, i could look into adding ‘dm’ as an
‘--export’ option to the above ‘redwoods’ script?
all of the above is executed behind the scenes of the on-line ERG
interface at ‘http://erg.delph-in.net/’ (currently still running
2014). it is possible to programmatically obtain parses in many
different output formats; see ‘ErgApi’ on the DELPH-IN wiki. i attach
a simple script to exemplify this functionality:
$ python3 ./rest.py "Kim wanted to be heard."
1 Kim Kim NNP - - named:x-c ARG1 ARG2
2 wanted want VBD + + v:e-i-h _ _
3 to to TO - - _ _ _
4 be be VB - - _ _ _
5 heard hear VBN - + v:e-i-p ARG2 _
6 . _ . - - _ _ _
please do not hesitate to ask for further clarification or suggest
improvements to the interface (or the DM dependencies proper)! i care
about this stuff :-).
best wishes, oe
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 371 bytes
Desc: not available
More information about the developers