[developers] pyDelphin and JACY

Michael Wayne Goodman goodmami at uw.edu
Wed Dec 20 03:21:23 CET 2017


Hi Tuan Anh,

First, this line is really odd:

    x = dmrx.etree_tostring(dmrx._encode_dmrs(obj)).decode('utf-8')

The etree_tostring() function (defined in delphin.mrs.util) is an attempt
to smooth over differences between Python2 and Python3 (since PyDelphin
supports both), although it's not meant to be called directly. I should, at
least, rename it to _etree_tostring to indicate that it's not part of the
public API. On a similar note, you are calling dmrx._encode_dmrs(), which
also isn't part of the public API (again, noted by the _ prefix on the
function name, which is a Python convention). I don't recommend using such
calls, as they may disappear in the future without warning.

(Also, that line may work for you, but versions of PyDelphin after August
12 2017 require a properties dictionary as a second argument to
_encode_dmrs())

Can't you just do this?

    x = dmrx.dumps_one(obj)

Similarly, instead of:

    dmrses = list(dmrx.deserialize(io.StringIO(x)))

just use:

    dmrses = list(dmrx.loads(x))

Now getting to the problem causing the error, it looks like PyDelphin is
tripping on a malformed predicate: _te_adjunct_rel. The initial _ indicates
it's a surface predicate, so the DMRX encoder creates a <realpred> element
for it, however there is no course-grained sense (i.e. POS), which must be
a single letter, so you get the following element:

    <realpred lemma="te" sense="adjunct" />

Then PyDelphin's delphin.mrs.components.Pred class has trouble with this
because it assumes the pos field is not None. I've created a bug report
here: https://github.com/delph-in/pydelphin/issues/129

The problem with Jacy was already fixed a few months ago:
https://github.com/delph-in/jacy/issues/2

So you could also get around the PyDelphin bug by getting the latest Jacy.

On Tue, Dec 19, 2017 at 5:31 PM, Tuấn Anh Lê <tuananh.ke at gmail.com> wrote:

> Hi everyone,
>
> I found this bug while using pyDelphin to process MRS output from JACY.
> Can someone shed some light on this for me?
>
> The original sentence was "猫を見ていた。"
>
> Source code
> ---------------------------------------
> import io
> from delphin.mrs import simplemrs
> from delphin.mrs import dmrx
>
> m = '''[ TOP: h0
>   INDEX: e2 [ e TENSE: past MOOD: indicative PROG: - PERF: - ASPECT:
> default_aspect PASS: - SF: prop ]
>   RELS: < [ udef_q_rel<0:1> LBL: h4 ARG0: x5 [ x PERS: 3 ] RSTR: h6 BODY:
> h7 ]
>           [ "_neko_n_rel"<0:1> LBL: h8 ARG0: x5 ]
>           [ "_miru_v_1_rel"<4:5> LBL: h9 ARG0: e10 [ e TENSE: tense MOOD:
> indicative PROG: - PERF: - ASPECT: default_aspect PASS: - SF: prop ] ARG1:
> i11 ARG2: x5 ]
>           [ "_te_adjunct_rel"<-1:-1> LBL: h1 ARG0: e12 [ e TENSE: tense
> MOOD: indicative PROG: - PERF: - ASPECT: default_aspect PASS: - SF: prop ]
> L-HNDL: h13 R-HNDL: h14 ]
>           [ "_iru_v_be_rel"<8:9> LBL: h15 ARG0: e2 ARG1: i3 ] >
>   HCONS: < h0 qeq h1 h6 qeq h8 h13 qeq h9 h14 qeq h15 > ]'''
>
> # MRS string to pyDelphin
> obj = simplemrs.loads_one(m)
> print(obj)
>
> # pyDelphin to XML
> x = dmrx.etree_tostring(dmrx._encode_dmrs(obj)).decode('utf-8')
> print(x)
>
> # XML back to pyDelphin
> dmrses = list(dmrx.deserialize(io.StringIO(x)))
> print(dmrses)
>
>
> Error messages
> ---------------------------------------
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/tuananh/tmp/jacy.py", line 23, in <module>
>     dmrses = list(dmrx.deserialize(io.StringIO(x)))
>   File "/home/tuananh/ep3/lib/python3.6/site-packages/delphin/mrs/dmrx.py",
> line 74, in deserialize
>     yield _deserialize_dmrs(elem)
>   File "/home/tuananh/ep3/lib/python3.6/site-packages/delphin/mrs/dmrx.py",
> line 85, in _deserialize_dmrs
>     return Dmrs(nodes=list(map(_decode_node, elem.iter('node'))),
>   File "/home/tuananh/ep3/lib/python3.6/site-packages/delphin/mrs/dmrx.py",
> line 101, in _decode_node
>     return Node(pred=_decode_pred(elem.find('*[1]')),
>   File "/home/tuananh/ep3/lib/python3.6/site-packages/delphin/mrs/dmrx.py",
> line 122, in _decode_pred
>     elem.get('sense'))
>   File "/home/tuananh/ep3/lib/python3.6/site-packages/delphin/mrs/components.py",
> line 453, in realpred
>     predstr = '_'.join([''] + string_tokens + ['rel'])
> TypeError: sequence item 2: expected str instance, NoneType found
>
> Yours,
> --
> Tuan Anh
>



-- 
Michael Wayne Goodman
Ph.D. Candidate, UW Linguistics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20171219/c5fc1c87/attachment.html>


More information about the developers mailing list