[developers] [sdp-organizers] From EDS/RMS to DM

Alexandre Rademaker arademaker at gmail.com
Tue Jul 28 16:37:20 CEST 2020


Hi Stephan,

While processing a sample of the wordnet glosses, the redwoods script produced two invalid .gz files. One example is for the sentence: "a historical region in central and northern Yugoslavia; Serbs settled the region in the 6th and 7th centuries"

See the derivation node:

(527 #<Printer Error, obj=#x10000000fc9: Null lexical entry type NIL>)

In the result file of the profile, the derivation node looks fine, the 334.gz is attached.

(527 a_det_rbst 0.000000 0 1 ("a" 347 "token [ +FORM \\"a\\" +FROM \\"0\\" +TO \\"1\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"0\\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \\"DT\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL + ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<0:1>\\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"a\\" +TICK + +ONSET c-or-v-onset ]”))

I am trying to understand the lisp code of redwoods.lisp, but without being able to load it in my slime environment, navigating in the source code and debugging is a nightmare. I know that export-tree is doing more than just copy the derivation tree from the profile, but I didn’t understand what it is doing with the derivations, it is hard to have the `big picture`. BTW, you really like the `loop` macro! ;-)

These errors cause the dtm script to fail, although I should not expect it to work with the current trunk version of ERG, dm.cfg was not changed since 2012.

% svn info etc/dm.cfg
Path: etc/dm.cfg
Name: dm.cfg
Working Copy Root Path: /Users/ar/hpsg/terg
URL: http://svn.delph-in.net/erg/trunk/etc/dm.cfg
Relative URL: ^/erg/trunk/etc/dm.cfg
Repository Root: http://svn.delph-in.net
Repository UUID: 3df82f5b-d43a-0410-af33-fce91db48ec5
Revision: 28882
Node Kind: file
Schedule: normal
Last Changed Author: oe
Last Changed Rev: 12172
Last Changed Date: 2012-12-01 18:54:20 -0200 (Sat, 01 Dec 2012)
Text Last Updated: 2019-02-07 20:21:10 -0200 (Thu, 07 Feb 2019)
Checksum: b8097dfbd5cc9b9d654233314006f8c8b0fcecaa

Since my goal is to have at least one bi-lexical format in the WSI interface, I am still trying to understand what the dtm (converter) does. The converter.pdf explains how to use the code, input/output, but it doesn't disclose its logic, the high-level description of the system. Eventually, we can reimplement the dtm using pydelphin (see https://github.com/delph-in/pydelphin/issues/122). The error that I have reported in my previous message when I call redwoods with the dm in `--export input,derivation,mrs,eds,dm` is probably related to what I am showing here since the `dm-construct` function end ups calling the python dtm.py code. Finally, the handling of `:dm` keyword was not copied to the lkb-fos/src/tsdb/lisp/ source code. But I am sure you and John are both aware of that.

As always, comments and possible references are welcome! ;-)

Best,
Alexandre

PS: I know that all these errors are expected since, as you said, `I am venturing into unexplored territory` by mixing the ‘classic’ DELPHIN toolchain with the 'modern tools from the pacific northwest’. Yes, I am processing the profiles with ACE/pydelphin and ‘exporting’ data (derivation, input, MRS and EDS) from them with redwoods lisp code. But I assume we aim at have interoperability between the tools, right? That is my motivation to keep reporting the errors. Please, correct me if I am wrong.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 334.gz
Type: application/x-gzip
Size: 3753 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20200728/2fe7849a/attachment.gz>
-------------- next part --------------





More information about the developers mailing list