[developers] [sdp-organizers] From EDS/RMS to DM

Stephan Oepen oe at ifi.uio.no
Sat Feb 9 20:11:40 CET 2019

hi alexandre,

thanks for your interest in the DM bi-lexical dependencies.  i am
copying the DELPH-IN ‘developers’ list, just in case others might be
interested in this topic too (or even just to archive this overdue
summary of the current state of affairs :-).

> We recently read the 'Who Did What to Whom? A Contrastive Study of Syntacto-Semantic Dependencies' paper, and we were wondering how to get DM  output from the EDS/MRS or other semantic structure produced by ERG. I can get MRS and EDS output using the redwoods script in the LOGON tree like so:
> ./redwoods --binary --erg --default --composite --target /tmp export mrs,eds --active all erg/1214/input/16-05-23/pet
> but redwoods doesn’t seem to support the DM format, nor do I see it in the [incr tsdb()]documentation.

yes, conversion (from MRS, via EDS) to DM is not yet tightly
integrated with the [incr tsdb()] export functionality.  the converter
was developed by angelina ivanova in a slightly weird lisp dialect
(where whitespace carries meaning), and it operates as a
post-processor on [incr tsdb()] export files.

assuming you have a functional LOGON tree (as it seems you do :-), the
following should work:

$LOGONROOT/redwoods --erg --target /tmp --export input,derivation,mrs,eds mrs
$LOGONROOT/bin/dtm --tok ptb --data /tmp/mrs --grammar
$LOGONROOT/lingo/erg --dtm /tmp

the output will be a file ‘/tmp/mrs.ptb.dtm’ in what angelina called
the DTM format; it provides both the DT (syntactic) and DM (semantic)
dependencies.  a description of the various fields is here, though i
vaguely recall there may have been minor revisions after that report
was written:


the SDP file format (which generalizes over a range of different
bi-lexical semantic dependencies) was later defined for the SemEval
2014 and 2015 tasks on semantic dependency parsing; see
‘http://sdp.delph-in.net/’.  conversion from DTM to SDP involves (a)
dropping the DT columns; adding (b) lemmas (from the ERG lexicon), (c)
parts of speech (from TnT), and (d) frame identifiers (from the ERG
SEM-I); (e) simplifying a few dependency relations (e.g.
‘compound_name’ to ‘compound’, ‘unspec_manner’ to ‘manner’, and such);
and (f) patching up the internal structure of contracted negations
when outputting PTB tokenization.

conversion from DTM to SDP is, again, part of [incr tsdb()].  the
current implementation of that last step is somewhat ERG-specific (and
has only been systematically validated for the 2014 release).  at
present, i am afraid, there is no general user interface to this code.
it resides in ‘dm.lisp’ (in [incr tsdb()]), where i just now committed
the function dtm-to-dm(), which i had used for SemEval 2014 and 2015.
if you were interested in a more robust way of generating DM from your
own parses or treebanks, i could look into adding ‘dm’ as an
‘--export’ option to the above ‘redwoods’ script?

all of the above is executed behind the scenes of the on-line ERG
interface at ‘http://erg.delph-in.net/’ (currently still running
2014).  it is possible to programmatically obtain parses in many
different output formats; see ‘ErgApi’ on the DELPH-IN wiki.  i attach
a simple script to exemplify this functionality:

$ python3 ./rest.py "Kim wanted to be heard."
1    Kim    Kim    NNP    -    -    named:x-c    ARG1    ARG2
2    wanted    want    VBD    +    +    v:e-i-h    _    _
3    to    to    TO    -    -    _    _    _
4    be    be    VB    -    -    _    _    _
5    heard    hear    VBN    -    +    v:e-i-p    ARG2    _
6    .    _    .    -    -    _    _    _

please do not hesitate to ask for further clarification or suggest
improvements to the interface (or the DM dependencies proper)!  i care
about this stuff :-).

best wishes, oe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rest.py
Type: text/x-python
Size: 371 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20190209/51b0316f/attachment.py>

More information about the developers mailing list