<div dir="ltr"><div><div><div><div><div><div><div>Hi Tuan Anh,<br><br></div>First, this line is really odd:<br><br> x = dmrx.etree_tostring(dmrx._<wbr>encode_dmrs(obj)).decode('utf-<wbr>8')<br><br></div>The etree_tostring() function (defined in delphin.mrs.util) is an attempt to smooth over differences between Python2 and Python3 (since PyDelphin supports both), although it's not meant to be called directly. I should, at least, rename it to _etree_tostring to indicate that it's not part of the public API. On a similar note, you are calling dmrx._encode_dmrs(), which also isn't part of the public API (again, noted by the _ prefix on the function name, which is a Python convention). I don't recommend using such calls, as they may disappear in the future without warning.</div><div><br></div><div>(Also, that line may work for you, but versions of PyDelphin after August 12 2017 require a properties dictionary as a second argument to _encode_dmrs())<br></div><div><br></div>Can't you just do this?<br><br></div> x = dmrx.dumps_one(obj)</div><div><br></div><div>Similarly, instead of:<br><br> dmrses = list(dmrx.deserialize(io.StringIO(x)))<br><br>just use:<br><br> dmrses = list(dmrx.loads(x))</div></div><div><br></div><div>Now getting to the problem causing the error, it looks like PyDelphin is tripping on a malformed predicate: _te_adjunct_rel. The initial _ indicates it's a surface predicate, so the DMRX encoder creates a <realpred> element for it, however there is no course-grained sense (i.e. POS), which must be a single letter, so you get the following element:</div><div><br></div><div> <realpred lemma="te" sense="adjunct" /><br></div><div><br></div>Then PyDelphin's delphin.mrs.components.Pred class has trouble with this because it assumes the pos field is not None. I've created a bug report here: <a href="https://github.com/delph-in/pydelphin/issues/129">https://github.com/delph-in/pydelphin/issues/129</a></div><div><br></div><div>The problem with Jacy was already fixed a few months ago: <a href="https://github.com/delph-in/jacy/issues/2">https://github.com/delph-in/jacy/issues/2</a></div><div><br></div><div>So you could also get around the PyDelphin bug by getting the latest Jacy.<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Dec 19, 2017 at 5:31 PM, Tuấn Anh Lê <span dir="ltr"><<a href="mailto:tuananh.ke@gmail.com" target="_blank">tuananh.ke@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>Hi everyone,<br><br></div>I found this bug while using pyDelphin to process MRS output from JACY. Can someone shed some light on this for me?</div><div><br></div><div>The original sentence was "猫を見ていた。"<br></div><div><br></div><span style="font-family:monospace,monospace">Source code<br>------------------------------<wbr>---------<br>import io<br>from delphin.mrs import simplemrs<br>from delphin.mrs import dmrx<br><br>m = '''[ TOP: h0<br> INDEX: e2 [ e TENSE: past MOOD: indicative PROG: - PERF: - ASPECT: default_aspect PASS: - SF: prop ]<br> RELS: < [ udef_q_rel<0:1> LBL: h4 ARG0: x5 [ x PERS: 3 ] RSTR: h6 BODY: h7 ]<br> [ "_neko_n_rel"<0:1> LBL: h8 ARG0: x5 ]<br> [ "_miru_v_1_rel"<4:5> LBL: h9 ARG0: e10 [ e TENSE: tense MOOD: indicative PROG: - PERF: - ASPECT: default_aspect PASS: - SF: prop ] ARG1: i11 ARG2: x5 ]<br> [ "_te_adjunct_rel"<-1:-1> LBL: h1 ARG0: e12 [ e TENSE: tense MOOD: indicative PROG: - PERF: - ASPECT: default_aspect PASS: - SF: prop ] L-HNDL: h13 R-HNDL: h14 ]<br> [ "_iru_v_be_rel"<8:9> LBL: h15 ARG0: e2 ARG1: i3 ] ><br> HCONS: < h0 qeq h1 h6 qeq h8 h13 qeq h9 h14 qeq h15 > ]'''<br><br># MRS string to pyDelphin<br>obj = simplemrs.loads_one(m)<br>print(obj)<br><br># pyDelphin to XML<br>x = dmrx.etree_tostring(dmrx._<wbr>encode_dmrs(obj)).decode('utf-<wbr>8')<br>print(x)<br><br># XML back to pyDelphin<br>dmrses = list(dmrx.deserialize(io.<wbr>StringIO(x)))<br>print(dmrses)<br><br><br>Error messages<br>------------------------------<wbr>---------<br>Traceback (most recent call last):<br> File "<stdin>", line 1, in <module><br> File "/home/tuananh/tmp/jacy.py", line 23, in <module><br> dmrses = list(dmrx.deserialize(io.<wbr>StringIO(x)))<br> File "/home/tuananh/ep3/lib/<wbr>python3.6/site-packages/<wbr>delphin/mrs/dmrx.py", line 74, in deserialize<br> yield _deserialize_dmrs(elem)<br> File "/home/tuananh/ep3/lib/<wbr>python3.6/site-packages/<wbr>delphin/mrs/dmrx.py", line 85, in _deserialize_dmrs<br> return Dmrs(nodes=list(map(_decode_<wbr>node, elem.iter('node'))),<br> File "/home/tuananh/ep3/lib/<wbr>python3.6/site-packages/<wbr>delphin/mrs/dmrx.py", line 101, in _decode_node<br> return Node(pred=_decode_pred(elem.<wbr>find('*[1]')),<br> File "/home/tuananh/ep3/lib/<wbr>python3.6/site-packages/<wbr>delphin/mrs/dmrx.py", line 122, in _decode_pred<br> elem.get('sense'))<br> File "/home/tuananh/ep3/lib/<wbr>python3.6/site-packages/<wbr>delphin/mrs/components.py", line 453, in realpred<br> predstr = '_'.join([''] + string_tokens + ['rel'])<br>TypeError: sequence item 2: expected str instance, NoneType found</span><br><span style="font-family:monospace,monospace"></span></div> <br><span style="font-family:monospace,monospace"></span><div><div><div><div><div><div class="m_-908909329702844591gmail_signature"><div dir="ltr"><div><div dir="ltr"><span style="font-size:small">Yours,</span><span class="HOEnZb"><font color="#888888"><br style="font-size:small"><span style="font-size:small">-- </span><br style="font-size:small"><div style="font-size:small"><div dir="ltr">Tuan Anh<br></div></div></font></span></div></div></div></div>
</div></div></div></div></div></div>
</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">Michael Wayne Goodman<div>Ph.D. Candidate, UW Linguistics</div></div></div>
</div>