[developers] Bug report for ERG

Alexandre Rademaker arademaker at gmail.com
Tue Nov 10 22:39:32 CET 2020


Hi,

I am trying to parse the sentences from EWT corpus (https://github.com/universaldependencies/UD_English-EWT) but in the DEV set I have a non-sense sentence with only an url between brackets:

 [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]

ACE reports an invalid MRS. The error is in the character 2666, so probably the error is the predicate:

 _search_x.htm?csp=34/NN_u_unknown

But the regex for predicates seems to support dot in the name of the predicate:

http://moin.delph-in.net/MrsRfc#SerializationFormats

Anyway, the pre-processing of the sentence seems wrong to me in ERG trunk version, the tokenisation broke the url into many tokens and consumed the protocol `http://` prefix:

% ace -g ~/hpsg/wn/terg-mac.dat -E
[http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
www.usatoday. com / tech/ science / space/ 2005 – 03 – 09 - nasa - search_x.htm?csp=34

ERG (2018) produced what I was expecting:

% ace -g erg-mac.dat -E
[http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34

ERG (1214) produced what I was expecting:

% ace -g erg-lingo-mac.dat -E
[http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]
[ http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 ]


>>> response = ace.parse(grm, '[http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]')
NOTE: hit RAM limit while unpacking
NOTE: parsed 1 / 1 sentences, avg 1536033k, time 51.15306s

>>> response.result(0).mrs()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/interface.py", line 146, in mrs
    mrs = simplemrs.decode(mrs)
  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 112, in decode
    return _decode_mrs(lexer)
  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 200, in _decode_mrs
    rels.append(_decode_rel(lexer, variables))
  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 252, in _decode_rel
    _, label = lexer.expect((FEATURE, 'LBL'), (SYMBOL, None))
  File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/util.py", line 473, in expect
    raise self._errcls('expected: ' + err,
delphin.mrs._exceptions.MRSSyntaxError:
  line 1, character 2666
    [ LTOP: h0 INDEX: e2 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] RELS: < [ implicit_conj<8:79> LBL: h1 ARG0: e2 ARG1: e4 [ e SF: prop TENSE: tensed MOOD: indicative ] ARG2: e5 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ]  [ unknown<8:21> LBL: h1 ARG0: e4 ARG: u6 ]  [ _www.usatoday./JJ_u_unknown<8:21> LBL: h1 ARG0: e7 [ e SF: prop ] ARG1: u6 ]  [ implicit_conj<21:79> LBL: h1 ARG0: e5 ARG1: e8 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e9 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ]  [ unknown<21:49> LBL: h1 ARG0: e8 ARG: x10 ]  [ udef_q<21:49> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ]  [ udef_q<21:24> LBL: h14 ARG0: x15 [ x PERS: 3 NUM: sg ] RSTR: h16 BODY: h17 ]  [ _com/NN_u_unknown<21:24> LBL: h18 ARG0: x15 ]  [ _and_c<24:25> LBL: h19 ARG0: x10 ARG1: x15 ARG2: x20 ]  [ udef_q<25:49> LBL: h21 ARG0: x20 RSTR: h22 BODY: h23 ]  [ udef_q<25:37> LBL: h24 ARG0: x25 [ x PERS: 3 NUM: sg ] RSTR: h26 BODY: h27 ]  [ _tech//JJ_u_unknown<25:30> LBL: h28 ARG0: e29 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: x25 ]  [ _science_n_1<30:37> LBL: h28 ARG0: x25 ]  [ _and_c<37:38> LBL: h30 ARG0: x20 ARG1: x25 ARG2: x31 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<38:49> LBL: h32 ARG0: x31 RSTR: h33 BODY: h34 ]  [ compound<38:49> LBL: h35 ARG0: e36 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x31 ARG2: x37 [ x PT: pt ] ]  [ udef_q<38:44> LBL: h38 ARG0: x37 RSTR: h39 BODY: h40 ]  [ _space//NN_u_unknown<38:44> LBL: h41 ARG0: x37 ]  [ yofc<44:48> LBL: h35 CARG: "2005" ARG0: x31 ]  [ implicit_conj<49:79> LBL: h1 ARG0: e9 ARG1: e43 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e44 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ]  [ unknown<49:52> LBL: h1 ARG0: e43 ARG: x45 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<49:52> LBL: h46 ARG0: x45 RSTR: h47 BODY: h48 ]  [ yofc<49:51> LBL: h49 CARG: "03" ARG0: x45 ]  [ implicit_conj<52:79> LBL: h1 ARG0: e44 ARG1: e51 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e52 [ e SF: prop-or-ques ] ]  [ unknown<52:55> LBL: h1 ARG0: e51 ARG: x53 [ x PERS: 3 NUM: sg IND: + ] ]  [ proper_q<52:55> LBL: h54 ARG0: x53 RSTR: h55 BODY: h56 ]  [ yofc<52:54> LBL: h57 CARG: "09" ARG0: x53 ]  [ unknown<55:79> LBL: h1 ARG0: e52 ARG: x59 [ x PERS: 3 NUM: sg ] ]  [ udef_q<55:79> LBL: h60 ARG0: x59 RSTR: h61 BODY: h62 ]  [ compound<55:79> LBL: h63 ARG0: e64 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x59 ARG2: x65 [ x PERS: 3 NUM: sg IND: + PT: pt ] ]  [ proper_q<55:60> LBL: h66 ARG0: x65 RSTR: h67 BODY: h68 ]  [ named<55:59> LBL: h69 CARG: "NASA" ARG0: x65 ]  [ _search_x.htm?csp=34/NN_u_unknown<60:79> LBL: h63 ARG0: x59 ] > HCONS: < h0 qeq h1 h12 qeq h19 h16 qeq h18 h22 qeq h30 h26 qeq h28 h33 qeq h35 h39 qeq h41 h47 qeq h49 h55 qeq h57 h61 qeq h63 h67 qeq h69 > ICONS: < > ]
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              ^
MRSSyntaxError: expected: a feature


Best,
Alexandre




More information about the developers mailing list