[developers] Bug report for ERG

Alexandre Rademaker arademaker at gmail.com
Wed Nov 11 00:32:17 CET 2020

BTW, regardless the tokenisation issue, an invalid MRS should not be produced, right? 


> On 10 Nov 2020, at 18:39, Alexandre Rademaker <arademaker at gmail.com> wrote:
> Hi,
> I am trying to parse the sentences from EWT corpus (https://github.com/universaldependencies/UD_English-EWT) but in the DEV set I have a non-sense sentence with only an url between brackets:
> [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
> ACE reports an invalid MRS. The error is in the character 2666, so probably the error is the predicate:
> _search_x.htm?csp=34/NN_u_unknown
> But the regex for predicates seems to support dot in the name of the predicate:
> http://moin.delph-in.net/MrsRfc#SerializationFormats
> Anyway, the pre-processing of the sentence seems wrong to me in ERG trunk version, the tokenisation broke the url into many tokens and consumed the protocol `http://` prefix:
> % ace -g ~/hpsg/wn/terg-mac.dat -E
> [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
> www.usatoday. com / tech/ science / space/ 2005 – 03 – 09 - nasa - search_x.htm?csp=34
> ERG (2018) produced what I was expecting:
> % ace -g erg-mac.dat -E
> [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
> www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34
> ERG (1214) produced what I was expecting:
> % ace -g erg-lingo-mac.dat -E
> [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
> [ http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 ]
>>>> response = ace.parse(grm, '[http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]')
> NOTE: hit RAM limit while unpacking
> NOTE: parsed 1 / 1 sentences, avg 1536033k, time 51.15306s
>>>> response.result(0).mrs()
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/interface.py", line 146, in mrs
>    mrs = simplemrs.decode(mrs)
>  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 112, in decode
>    return _decode_mrs(lexer)
>  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 200, in _decode_mrs
>    rels.append(_decode_rel(lexer, variables))
>  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 252, in _decode_rel
>    _, label = lexer.expect((FEATURE, 'LBL'), (SYMBOL, None))
>  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/util.py", line 473, in expect
>    raise self._errcls('expected: ' + err,
> delphin.mrs._exceptions.MRSSyntaxError:
>  line 1, character 2666
>    [ LTOP: h0 INDEX: e2 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] RELS: < [ implicit_conj<8:79> LBL: h1 ARG0: e2 ARG1: e4 [ e SF: prop TENSE: tensed MOOD: indicative ] ARG2: e5 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ]  [ unknown<8:21> LBL: h1 ARG0: e4 ARG: u6 ]  [ _www.usatoday./JJ_u_unknown<8:21> LBL: h1 ARG0: e7 [ e SF: prop ] ARG1: u6 ]  [ implicit_conj<21:79> LBL: h1 ARG0: e5 ARG1: e8 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e9 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ]  [ unknown<21:49> LBL: h1 ARG0: e8 ARG: x10 ]  [ udef_q<21:49> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ]  [ udef_q<21:24> LBL: h14 ARG0: x15 [ x PERS: 3 NUM: sg ] RSTR: h16 BODY: h17 ]  [ _com/NN_u_unknown<21:24> LBL: h18 ARG0: x15 ]  [ _and_c<24:25> LBL: h19 ARG0: x10 ARG1: x15 ARG2: x20 ]  [ udef_q<25:49> LBL: h21 ARG0: x20 RSTR: h22 BODY: h23 ]  [ udef_q<25:37> LBL: h24 ARG0: x25 [ x PERS: 3 NUM: sg ] RSTR: h26 BODY: h27 ]  [ _tech//JJ_u_unknown<25:30> LBL: h28 ARG0: e29 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: x25 ]  [ _science_n_1<30:37> LBL: h28 ARG0: x25 ]  [ _and_c<37:38> LBL: h30 ARG0: x20 ARG1: x25 ARG2: x31 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<38:49> LBL: h32 ARG0: x31 RSTR: h33 BODY: h34 ]  [ compound<38:49> LBL: h35 ARG0: e36 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x31 ARG2: x37 [ x PT: pt ] ]  [ udef_q<38:44> LBL: h38 ARG0: x37 RSTR: h39 BODY: h40 ]  [ _space//NN_u_unknown<38:44> LBL: h41 ARG0: x37 ]  [ yofc<44:48> LBL: h35 CARG: "2005" ARG0: x31 ]  [ implicit_conj<49:79> LBL: h1 ARG0: e9 ARG1: e43 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e44 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ]  [ unknown<49:52> LBL: h1 ARG0: e43 ARG: x45 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<49:52> LBL: h46 ARG0: x45 RSTR: h47 BODY: h48 ]  [ yofc<49:51> LBL: h49 CARG: "03" ARG0: x45 ]  [ implicit_conj<52:79> LBL: h1 ARG0: e44 ARG1: e51 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e52 [ e SF: prop-or-ques ] ]  [ unknown<52:55> LBL: h1 ARG0: e51 ARG: x53 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<52:55> LBL: h54 ARG0: x53 RSTR: h55 BODY: h56 ]  [ yofc<52:54> LBL: h57 CARG: "09" ARG0: x53 ]  [ unknown<55:79> LBL: h1 ARG0: e52 ARG: x59 [ x PERS: 3 NUM: sg ] ]  [ udef_q<55:79> LBL: h60 ARG0: x59 RSTR: h61 BODY: h62 ]  [ compound<55:79> LBL: h63 ARG0: e64 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x59 ARG2: x65 [ x PERS: 3 NUM: sg IND: + PT: pt ] ]  [ proper_q<55:60> LBL: h66 ARG0: x65 RSTR: h67 BODY: h68 ]  [ named<55:59> LBL: h69 CARG: "NASA" ARG0: x65 ]  [ _search_x.htm?csp=34/NN_u_unknown<60:79> LBL: h63 ARG0: x59 ] > HCONS: < h0 qeq h1 h12 qeq h19 h16 qeq h18 h22 qeq h30 h26 qeq h28 h33 qeq h35 h39 qeq h41 h47 qeq h49 h55 qeq h57 h61 qeq h63 h67 qeq h69 > ICONS: < > ]
>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              ^
> MRSSyntaxError: expected: a feature
> Best,
> Alexandre

More information about the developers mailing list