[developers] [sdp-organizers] From EDS/RMS to DM

Sat Feb 16 02:42:42 CET 2019

Hi Stephan,

> On 9 Feb 2019, at 17:11, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> hi alexandre,
> 
> thanks for your interest in the DM bi-lexical dependencies.  i am
> copying the DELPH-IN ‘developers’ list, just in case others might be
> interested in this topic too (or even just to archive this overdue
> summary of the current state of affairs :-).

Good idea. If you prefer, I can also write my questions in http://delphinqa.ling.washington.edu forum, are you using it?

>> We recently read the 'Who Did What to Whom? A Contrastive Study of Syntacto-Semantic Dependencies' paper, and we were wondering how to get DM  output from the EDS/MRS or other semantic structure produced by ERG. I can get MRS and EDS output using the redwoods script in the LOGON tree like so:
>> 
>> ./redwoods --binary --erg --default --composite --target /tmp export mrs,eds --active all erg/1214/input/16-05-23/pet
>> 
>> but redwoods doesn’t seem to support the DM format, nor do I see it in the [incr tsdb()]documentation.
> 
> yes, conversion (from MRS, via EDS) to DM is not yet tightly
> integrated with the [incr tsdb()] export functionality.  the converter
> was developed by angelina ivanova in a slightly weird lisp dialect
> (where whitespace carries meaning), and it operates as a
> post-processor on [incr tsdb()] export files.

kkk! Python is definitely weird, I complete agree with you! I am also a lisp programmer. I would love to rewrite the converter in Lisp once I start to understand better the whole LOGON/DELPH-IN ecosystem. I didn’t yet understand all lisp codes in the LOGON tree, the packages, dependencies, etc.

> assuming you have a functional LOGON tree (as it seems you do :-), the
> following should work:
> 
> $LOGONROOT/redwoods --erg --target /tmp --export input,derivation,mrs,eds mrs
> $LOGONROOT/bin/dtm --tok ptb --data /tmp/mrs --grammar
> $LOGONROOT/lingo/erg --dtm /tmp

I was able to produce the dtm file from a small sample of 50 sentences using:

$ ./parse --binary --erg+tnt --best 1 --text ~/tmp/sample.txt
$ ./redwoods --binary --erg --default --target ~/tmp --export input,derivation,mrs,eds --active all erg/1214/sample/19-02-15/pet
$ bin/dtm --tok ptb --data ~/tmp/erg.1214.sample.19-02-15.pet --grammar ~/logon/lingo/erg --dtm ~/tmp

Unfortunately, looks like dtm script is not working with profiles produced with ACE (http://moin.delph-in.net/LogonAnswer):

$ ./parse --binary --erg+tnt/ace --best 1 --text ~/tmp/sample.txt
$ ./redwoods --binary --erg --default --target ~/tmp --export input,derivation,mrs,eds --active all erg/1214/sample/19-02-15/ace
$ bin/dtm --tok ptb --data ~/tmp/erg.1214.sample.19-02-15.ace --grammar ~/logon/lingo/erg --dtm ~/tmp
1
Missing or incorrect derivation tree!
Traceback (most recent call last):
  File "/home/user/logon/uio/dtm/converter.py", line 3671, in <module>
    Converter().run()
  File "/home/user/logon/uio/dtm/converter.py", line 236, in run
    if fhdl_dict['log'] is not None:
KeyError: ‘log’

I have also tried to convert from data exported with redwoods from a profile created with ACE+ART using a different ‘home’ directory for TSDB:

$ ./redwoods --binary --erg --home ~/tmp/profiles --target ~/tmp --export input,derivation,mrs,eds --active all sample

But all files exported by this command above are empty. For example:

$ gzcat 35.gz
;;;
;;; Redwoods export of `sample';
;;; (nobody at 60eb05b6330f; 15-feb-2019 (23:53 h)).
;;;

[35] (1 of 1) {-1} `The well contained a show of gas and oil residues.’

It is not clear to me the parameters and how they change the behaviour of the parse and redwoods scripts. It seems that I could run the codes directly from the lisp REPL inside Emacs, right? These scripts call many different tools, right? PET, ACE, [incr tsdb()] etc. 

Moreover, I would like to use the trunk version of ERG (2018), not the 1214 available in the LOGON tree. The terg argument didn’t work for me.

Besides all the questions above and any feedback that you may give… I need to process 5,600 sentences. If I submit one file with 5,600 lines, the parse script stopped without any error message but in the profile I only have 8 lines in the result file. I tried to split the 5,600 lines into smaller files. If I parse a file with 2000 lines/sentences, during the first 1-5 sentences the script seems to be parsing but after some time it starts to produce outputs very fast, it looks like it is skipping the sentences and it finished with only 4 lines in the result file in the profile.

Since I am a Mac user, I am using docker (http://moin.delph-in.net/LkbMacintosh) for running all the code in a Lisp environment, may something related to it? Maybe something related to the PVM code?

> the output will be a file ‘/tmp/mrs.ptb.dtm’ in what angelina called
> the DTM format; it provides both the DT (syntactic) and DM (semantic)
> dependencies.  a description of the various fields is here, though i
> vaguely recall there may have been minor revisions after that report
> was written:
> 
> http://svn.emmtee.net/trunk/uio/dtm/converter.pdf
> 
> the SDP file format (which generalizes over a range of different
> bi-lexical semantic dependencies) was later defined for the SemEval
> 2014 and 2015 tasks on semantic dependency parsing; see
> ‘http://sdp.delph-in.net/’.  conversion from DTM to SDP involves (a)
> dropping the DT columns; adding (b) lemmas (from the ERG lexicon), (c)
> parts of speech (from TnT), and (d) frame identifiers (from the ERG
> SEM-I); (e) simplifying a few dependency relations (e.g.
> ‘compound_name’ to ‘compound’, ‘unspec_manner’ to ‘manner’, and such);
> and (f) patching up the internal structure of contracted negations
> when outputting PTB tokenization.

Wow! Not simple! 

But the interface http://wesearch.delph-in.net/sdp/search.jsp is precisely what I want to have. See http://wnpt.sl.res.ibm.com/wsi/. But so far I was only able to process ~800 small sentences and no nice graphical representation (DMs)! Snif.. :-(

> conversion from DTM to SDP is, again, part of [incr tsdb()].  the
> current implementation of that last step is somewhat ERG-specific (and
> has only been systematically validated for the 2014 release).  at
> present, i am afraid, there is no general user interface to this code.
> it resides in ‘dm.lisp’ (in [incr tsdb()]), where i just now committed
> the function dtm-to-dm(), which i had used for SemEval 2014 and 2015.
> if you were interested in a more robust way of generating DM from your
> own parses or treebanks, i could look into adding ‘dm’ as an
> ‘--export’ option to the above ‘redwoods’ script?

I didn’t find a dm.lisp file, but I found lingo/lkb/src/tsdb/lisp/sdp.lisp with the definition of the function dtm-to-dm. But this function, besides the input and output files, also expect an argument called ‘align’ (a list), how to obtain/pass this argument?

Adding this export option would be nice in the redwoods script. Please! 

But once we have the gz files and the sdp files, how the create-index from the wesearch code is able to link all representations of each sentence ? I suppose the final RDF link all representations of the same sentence using the sentence id, right?

> all of the above is executed behind the scenes of the on-line ERG
> interface at ‘http://erg.delph-in.net/’ (currently still running
> 2014).  it is possible to programmatically obtain parses in many
> different output formats; see ‘ErgApi’ on the DELPH-IN wiki.  i attach
> a simple script to exemplify this functionality:
> 
> $ python3 ./rest.py "Kim wanted to be heard."
> #12594
> 1    Kim    Kim    NNP    -    -    named:x-c    ARG1    ARG2
> 2    wanted    want    VBD    +    +    v:e-i-h    _    _
> 3    to    to    TO    -    -    _    _    _
> 4    be    be    VB    -    -    _    _    _
> 5    heard    hear    VBN    -    +    v:e-i-p    ARG2    _
> 6    .    _    .    -    -    _    _    _

Nice, the rest.py scripts works fine. I haven’t read the page http://moin.delph-in.net/ErgApi before, it is definitely useful for some tests, but I need to process a large amount of non public data. It would be really nice to be able to instantiate my own ERG endpoint. Any documentation about it?

> please do not hesitate to ask for further clarification or suggest
> improvements to the interface (or the DM dependencies proper)!  i care
> about this stuff :-).

Thank you very much for your message and I am sorry for so many questions! Hope that you can find some time to help me! Thank you very much.

Best,
Alexandre