[developers] pyDelphin and JACY
Michael Wayne Goodman
goodmami at uw.edu
Wed Dec 20 03:21:23 CET 2017
Hi Tuan Anh,
First, this line is really odd:
x = dmrx.etree_tostring(dmrx._encode_dmrs(obj)).decode('utf-8')
The etree_tostring() function (defined in delphin.mrs.util) is an attempt
to smooth over differences between Python2 and Python3 (since PyDelphin
supports both), although it's not meant to be called directly. I should, at
least, rename it to _etree_tostring to indicate that it's not part of the
public API. On a similar note, you are calling dmrx._encode_dmrs(), which
also isn't part of the public API (again, noted by the _ prefix on the
function name, which is a Python convention). I don't recommend using such
calls, as they may disappear in the future without warning.
(Also, that line may work for you, but versions of PyDelphin after August
12 2017 require a properties dictionary as a second argument to
_encode_dmrs())
Can't you just do this?
x = dmrx.dumps_one(obj)
Similarly, instead of:
dmrses = list(dmrx.deserialize(io.StringIO(x)))
just use:
dmrses = list(dmrx.loads(x))
Now getting to the problem causing the error, it looks like PyDelphin is
tripping on a malformed predicate: _te_adjunct_rel. The initial _ indicates
it's a surface predicate, so the DMRX encoder creates a <realpred> element
for it, however there is no course-grained sense (i.e. POS), which must be
a single letter, so you get the following element:
<realpred lemma="te" sense="adjunct" />
Then PyDelphin's delphin.mrs.components.Pred class has trouble with this
because it assumes the pos field is not None. I've created a bug report
here: https://github.com/delph-in/pydelphin/issues/129
The problem with Jacy was already fixed a few months ago:
https://github.com/delph-in/jacy/issues/2
So you could also get around the PyDelphin bug by getting the latest Jacy.
On Tue, Dec 19, 2017 at 5:31 PM, Tuấn Anh Lê <tuananh.ke at gmail.com> wrote:
> Hi everyone,
>
> I found this bug while using pyDelphin to process MRS output from JACY.
> Can someone shed some light on this for me?
>
> The original sentence was "猫を見ていた。"
>
> Source code
> ---------------------------------------
> import io
> from delphin.mrs import simplemrs
> from delphin.mrs import dmrx
>
> m = '''[ TOP: h0
> INDEX: e2 [ e TENSE: past MOOD: indicative PROG: - PERF: - ASPECT:
> default_aspect PASS: - SF: prop ]
> RELS: < [ udef_q_rel<0:1> LBL: h4 ARG0: x5 [ x PERS: 3 ] RSTR: h6 BODY:
> h7 ]
> [ "_neko_n_rel"<0:1> LBL: h8 ARG0: x5 ]
> [ "_miru_v_1_rel"<4:5> LBL: h9 ARG0: e10 [ e TENSE: tense MOOD:
> indicative PROG: - PERF: - ASPECT: default_aspect PASS: - SF: prop ] ARG1:
> i11 ARG2: x5 ]
> [ "_te_adjunct_rel"<-1:-1> LBL: h1 ARG0: e12 [ e TENSE: tense
> MOOD: indicative PROG: - PERF: - ASPECT: default_aspect PASS: - SF: prop ]
> L-HNDL: h13 R-HNDL: h14 ]
> [ "_iru_v_be_rel"<8:9> LBL: h15 ARG0: e2 ARG1: i3 ] >
> HCONS: < h0 qeq h1 h6 qeq h8 h13 qeq h9 h14 qeq h15 > ]'''
>
> # MRS string to pyDelphin
> obj = simplemrs.loads_one(m)
> print(obj)
>
> # pyDelphin to XML
> x = dmrx.etree_tostring(dmrx._encode_dmrs(obj)).decode('utf-8')
> print(x)
>
> # XML back to pyDelphin
> dmrses = list(dmrx.deserialize(io.StringIO(x)))
> print(dmrses)
>
>
> Error messages
> ---------------------------------------
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/home/tuananh/tmp/jacy.py", line 23, in <module>
> dmrses = list(dmrx.deserialize(io.StringIO(x)))
> File "/home/tuananh/ep3/lib/python3.6/site-packages/delphin/mrs/dmrx.py",
> line 74, in deserialize
> yield _deserialize_dmrs(elem)
> File "/home/tuananh/ep3/lib/python3.6/site-packages/delphin/mrs/dmrx.py",
> line 85, in _deserialize_dmrs
> return Dmrs(nodes=list(map(_decode_node, elem.iter('node'))),
> File "/home/tuananh/ep3/lib/python3.6/site-packages/delphin/mrs/dmrx.py",
> line 101, in _decode_node
> return Node(pred=_decode_pred(elem.find('*[1]')),
> File "/home/tuananh/ep3/lib/python3.6/site-packages/delphin/mrs/dmrx.py",
> line 122, in _decode_pred
> elem.get('sense'))
> File "/home/tuananh/ep3/lib/python3.6/site-packages/delphin/mrs/components.py",
> line 453, in realpred
> predstr = '_'.join([''] + string_tokens + ['rel'])
> TypeError: sequence item 2: expected str instance, NoneType found
>
> Yours,
> --
> Tuan Anh
>
--
Michael Wayne Goodman
Ph.D. Candidate, UW Linguistics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20171219/c5fc1c87/attachment.html>
More information about the developers
mailing list