[developers] STRINGPRED in MRS
Michael Wayne Goodman
goodmami at uw.edu
Fri Jul 28 23:32:51 CEST 2017
Hi Tuan Anh,
I'm glad you found the source of the problem. But can I ask why you are
using PyDelphin's SimpleDMRS in this way? The documentation (
https://github.com/delph-in/pydelphin/wiki/delphin.mrs.simpledmrs) states
that it is only meant as an export format for human consumption as it's
easier to read than XML or JSON, but it is not intended to be a supported
DMRS codec.
I ask because I found that Jan Buys had also used this format in his work
with DMRS. I don't want to promote yet another format variant for our
software to support, but if people really want to use it in processing I
should provide a way to read it back into PyDelphin.
On Thu, Jul 27, 2017 at 11:33 PM, Tuấn Anh Lê <tuananh.ke at gmail.com> wrote:
> I found the problem. It was not the transformations, but the manual string
> editing that stole my CARGs. When I edit DMRS manually, I convert DMRSes to
> strings first, and then convert them back to DMRS objects later. This is
> what I meant by "strings"
>
> dmrs {
> 10000 [def_explicit_q<0:2> x pers=3 num=sg ind=+];
> 10001 [poss<0:2> e tense=untensed prog=- perf=- mood=indicative sf=prop];
> 10002 [pronoun_q<0:2> x pers=1 num=sg pt=std];
> 10003 [pron<0:2> x pers=1 num=sg pt=std];
> 10004 [_name_n_of_rel<3:7> x pers=3 num=sg ind=+];
> 10005 [_be_v_id_rel<8:10> e tense=pres prog=- perf=- mood=indicative sf=prop];
> 10006 [udef_q<11:20> x pers=3 num=pl ind=+];
> 10007 [named<11:20>("Abraham") x pers=3 num=pl ind=+];
> 0:/H -> 10005;
> 10000:RSTR/H -> 10004;
> 10001:ARG2/NEQ -> 10003;
> 10001:ARG1/EQ -> 10004;
> 10002:RSTR/H -> 10003;
> 10005:ARG1/NEQ -> 10004;
> 10005:ARG2/NEQ -> 10007;
> 10006:RSTR/H -> 10007;
> }
>
> When I convert them back, I forgot that CARGs in the brackets after
> cfrom:cto, and not in the attribute list (pers=3 num=pl ind=+).
>
> On 28 July 2017 at 02:10, Michael Wayne Goodman <goodmami at uw.edu> wrote:
>
>> Forgive me, Tuan Anh, for bringing the discussion back to the list...
>>
>> On Thu, Jul 27, 2017 at 9:25 AM, Tuấn Anh Lê <tuananh.ke at gmail.com>
>> wrote:
>>
>>> Hi Mike, thank so much. I'm still debugging to see why the CARGS
>>> disappeared. I'm supporting different formats now and in the middle of
>>> converting back and forth something weird happened. I suspect it's my code
>>> that caused the problem. My parsing flow right now is pyDelphin/ACE => XMRS
>>> => XML/JSON/string (for manual editing).
>>>
>>> If I did lose the CARGS, is it possible to add them back in
>>> automatically given that I have this MRS and the sentence text or I have to
>>> write code to do that?
>>>
>>
>> Unless I see your transformations I can't really pinpoint where CARGS is
>> being lost. So instead I'll confirm that PyDelphin doesn't drop them during
>> the conversions you mentioned:
>>
>> >>> from delphin.interfaces import ace
>> >>> from delphin.mrs import xmrs, dmrx
>> >>> r = ace.parse(
>> ... '/home/goodmami/grammars/erg-1214/erg-1214.dat',
>> ... 'My name is Sherlock Holmes.')
>> NOTE: parsed 1 / 1 sentences, avg 3668k, time 0.12449s
>> >>> x = r.result(0).mrs() # ACE => XMRS
>> >>> j = xmrs.Dmrs.to_dict(x) # XMRS => JSON
>> >>> x2 = xmrs.Dmrs.from_dict(j) # JSON => XMRS
>> >>> x2.ep(10009).carg
>> 'Sherlock'
>> >>> d = dmrx.dumps_one(x) # XMRS => XML
>> >>> x3 = dmrx.loads_one(d) # XML => XMRS
>> >>> x3.ep('10009').carg # nodeid became string from conversion
>> 'Sherlock'
>>
>> I'm not sure what you meant by converting to "string". But I think it's
>> your transformations that are causing the lost CARGs. Are you able to say
>> what those transformations do?
>>
>> "My name is Sherlock Holmes.
>>>
>>> [ TOP: h0
>>> RELS: < [ def_explicit_q_rel<0:3> LBL: h1 ARG0: x12 [ x NUM: sg IND: + PERS: 3 ] RSTR: h17 ]
>>> [ poss_rel<0:3> LBL: h2 ARG0: e10 [ e TENSE: untensed MOOD: indicative PERF: - SF: prop PROG: - ] ARG1: x12 ARG2: x11 [ x NUM: sg PERS: 1 ] ]
>>> [ pronoun_q_rel<0:3> LBL: h3 ARG0: x11 RSTR: h18 ]
>>> [ pron_rel<0:3> LBL: h4 ARG0: x11 ]
>>> [ _name_n_of_rel<4:8> LBL: h2 ARG0: x12 ]
>>> [ _be_v_id_rel<9:11> LBL: h5 ARG0: e13 [ e TENSE: pres MOOD: indicative PERF: - SF: prop PROG: - ] ARG1: x12 ARG2: x16 [ x NUM: sg IND: + PERS: 3 ] ]
>>> [ proper_q_rel<12:28> LBL: h6 ARG0: x16 RSTR: h19 ]
>>> [ compound_rel<12:28> LBL: h7 ARG0: e14 [ e TENSE: untensed MOOD: indicative PERF: - SF: prop PROG: - ] ARG1: x16 ARG2: x15 [ x NUM: sg IND: + PERS: 3 ] ]
>>> [ proper_q_rel<12:20> LBL: h8 ARG0: x15 RSTR: h20 ] [ named_rel<12:20> LBL: h9 ARG0: x15 ]
>>> [ named_rel<21:28> LBL: h7 ARG0: x16 ] >
>>> HCONS: < h0 qeq h5 h17 qeq h2 h18 qeq h4 h19 qeq h7 h20 qeq h9 > ]
>>>
>>>
>>> On 27 July 2017 at 12:39, Michael Wayne Goodman <goodmami at uw.edu> wrote:
>>>
>>>> On Wed, Jul 26, 2017 at 6:42 PM, Tuấn Anh Lê <tuananh.ke at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Mike,
>>>>>
>>>>> Thanks for answering :D. I think I misunderstood STRINGPRED flag in
>>>>> pyDelphin. I always think that there are 2 types of predicates, namely
>>>>> GRAMMARPRED and REALPRED. However I have encountered two different types of
>>>>> GRAMMARPRED, the one without cargs (such as pronoun_q_rel) and the one with
>>>>> carg (such as named_rel). I thought STRINGPRED was used to represent the
>>>>> GRAMMARPRED with CARG.
>>>>>
>>>>
>>>> Ah I see. Unfortunately there is no predicate subcategorization that
>>>> indicates the presence of a constant argument. But, fortunately, there is
>>>> only one argument role for constant arguments (CARG), and nobody that I'm
>>>> aware of has chosen to rename it or add additional ones. The LKB and PET
>>>> allowed it to be customized via a *value-feats* parameter, but ACE just
>>>> assumes it will be "CARG", and nobody's complained about that yet. So I
>>>> think it's safe to assume it will be called "CARG". This assumption makes
>>>> the following easier...
>>>>
>>>> Right now I'm transforming DMRS using pyDelphin and I'm looking for a
>>>>> way to find GRAMMARPRED with CARG because after each transformation
>>>>> pyDelphin take my CARGS out and hide them some where. I need the CARGS for
>>>>> other mapping though. Is there a way to accomplish this?
>>>>>
>>>>
>>>> When working with Xmrs structures in PyDelphin, just look for "CARG" on
>>>> an EP's arguments. E.g. for some Xmrs object x that represents the sentence
>>>> "Abrams slept." where the named_rel EP has nodeid 10001:
>>>>
>>>> >>> x.ep(10001).args.get('CARG')
>>>> 'Abrams'
>>>>
>>>> For convenience, I also define a "carg" property on EPs:
>>>>
>>>> >>> x.ep(10001).carg
>>>> 'Abrams'
>>>>
>>>> These are essentially equivalent. If you just want to test for the
>>>> presence of the CARG, you can do this:
>>>>
>>>> >>> 'CARG' in x.ep(10001).args
>>>> True
>>>> >>> 'CARG' in x.ep(10000).args # 10000 is for proper_q
>>>> False
>>>>
>>>> In DMRS, nodes do not contain information about the arguments the node
>>>> participates in; this information is stored in the links. However, CARGs
>>>> are special in that they become node attributes instead of links:
>>>>
>>>> >>> from delphin.mrs.components import nodes
>>>> >>> ns = {n.nodeid: n for n in nodes(x)}
>>>> >>> ns[10001].carg
>>>> 'Abrams'
>>>>
>>>> If you try to recreate an Xmrs from Dmrs nodes and don't give it any
>>>> links, you will only get intrinsic arguments (ARG0s) and CARGs, and any
>>>> other arguments will be lost:
>>>>
>>>> >>> from delphin.mrs.xmrs import Dmrs
>>>> >>> d = Dmrs(nodes=ns.values())
>>>> >>> d.ep(10001).args
>>>> {'CARG': 'Abrams', 'ARG0': 'x5'}
>>>> >>> d.ep(10002).args # 10002 is for "slept"
>>>> {'ARG0': 'e6'}
>>>>
>>>> If you also give it the links, it should be able to recreate the
>>>> original (x)mrs.
>>>>
>>>> So I'm not sure what kind of transformation you are doing would cause
>>>> the CARG to be lost. Can you elaborate or provide a MWE (minimal working
>>>> example) that shows when the CARG is lost/hidden?
>>>>
>>>>
>>>>>
>>>>> On 27 July 2017 at 03:45, Michael Wayne Goodman <goodmami at uw.edu>
>>>>> wrote:
>>>>>
>>>>>> Hi Tuan Anh,
>>>>>>
>>>>>> The thinking of subtypes of predicates has changed over the years,
>>>>>> and as a result there's numerous overlapping terms for various things. I
>>>>>> believe that the current consensus (documented on
>>>>>> http://moin.delph-in.net/PredicateRfc) is that there are two main
>>>>>> axes of subcategorization for predicate symbols: abstract vs surface and
>>>>>> string vs type (with "real" predicates being a decomposed form of surface
>>>>>> preds). But I suspect what you mean by STRINGPRED is not the same thing as
>>>>>> described on the wiki?
>>>>>>
>>>>>> I'm afraid PyDelphin is a bit behind the times WRT these definitions
>>>>>> of predicates, and instead follows mostly what was described in the MRS DTD
>>>>>> (as I understood it at the time). Therefore currently PyDelphin calls
>>>>>> predicates beginning with an underscore a "stringpred", regardless of
>>>>>> whether the symbol was quoted or not. I.e., it should be called
>>>>>> "surfacepred" or something. Abstract predicates are called "grammarpred" in
>>>>>> PyDelphin. I have created a ticket for this bug:
>>>>>> https://github.com/delph-in/pydelphin/issues/117.
>>>>>>
>>>>>> Now back to your question, "named_rel" is called a GRAMMARPRED in
>>>>>> PyDelphin because the predicate does not start with an underscore.
>>>>>>
>>>>>> On Wed, Jul 26, 2017 at 8:25 AM, Tuấn Anh Lê <tuananh.ke at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi delphinians,
>>>>>>>
>>>>>>> I'm working on predicate mapping and found that named_rel are not
>>>>>>> STRINGPRED when processed by pyDelphin.Are these expected behaviours? Can
>>>>>>> someone please shed some light for me on this matter :D Thank you.
>>>>>>>
>>>>>>> REALPRED 1
>>>>>>> STRINGPRED 2
>>>>>>> <ElementaryPredication object (pron_rel (x24)) at 140241381023560> 0
>>>>>>> <ElementaryPredication object (pronoun_q_rel (x24)) at
>>>>>>> 140241380610120> 0
>>>>>>> <ElementaryPredication object (_have_v_1_rel (e25)) at
>>>>>>> 140241380610240> 1
>>>>>>> <ElementaryPredication object (_the_q_rel (x26)) at 140241380610360>
>>>>>>> 1
>>>>>>> <ElementaryPredication object (_pleasure_n_of_rel (x26)) at
>>>>>>> 140241380610480> 1
>>>>>>> <ElementaryPredication object (udef_q_rel (x27)) at 140241380610600>
>>>>>>> 0
>>>>>>> <ElementaryPredication object (nominalization_rel (x27)) at
>>>>>>> 140241380610720> 0
>>>>>>> <ElementaryPredication object (_make_v_1_rel (e28)) at
>>>>>>> 140241380610840> 1
>>>>>>> <ElementaryPredication object (_the_q_rel (x29)) at 140241380610960>
>>>>>>> 1
>>>>>>> <ElementaryPredication object (_doctor_n_1_rel (x29)) at
>>>>>>> 140241380611080> 1
>>>>>>> <ElementaryPredication object (def_explicit_q_rel (x31)) at
>>>>>>> 140241380611200> 0
>>>>>>> <ElementaryPredication object (poss_rel (e30)) at 140241380611320> 0
>>>>>>> <ElementaryPredication object (_acquaintance_n_1_rel (x31)) at
>>>>>>> 140241380611440> 1
>>>>>>> <ElementaryPredication object (_say_v_to_rel (e32)) at
>>>>>>> 140241380611560> 1
>>>>>>> <ElementaryPredication object (proper_q_rel (x33)) at
>>>>>>> 140241380611680> 0
>>>>>>> *<ElementaryPredication object (named_rel (x33)) at 140241380611800>
>>>>>>> 0*
>>>>>>> <ElementaryPredication object (_and_c_rel (e34)) at 140241380611920>
>>>>>>> 1
>>>>>>> <ElementaryPredication object (_in_p_rel (e35)) at 140241380612040> 1
>>>>>>> <ElementaryPredication object (udef_q_rel (x37)) at 140241380612160>
>>>>>>> 0
>>>>>>> <ElementaryPredication object (_a+few_a_1_rel (e36)) at
>>>>>>> 140241380612280> 1
>>>>>>> <ElementaryPredication object (_word_n_of_rel (x37)) at
>>>>>>> 140241380612400> 1
>>>>>>> <ElementaryPredication object (pron_rel (x38)) at 140241380612520> 0
>>>>>>> <ElementaryPredication object (pronoun_q_rel (x38)) at
>>>>>>> 140241380612640> 0
>>>>>>> <ElementaryPredication object (_sketch_v_1_rel (e39)) at
>>>>>>> 140241380612760> 1
>>>>>>> <ElementaryPredication object (_out_p_rel (e40)) at 140241380612880>
>>>>>>> 1
>>>>>>> <ElementaryPredication object (free_relative_q_rel (x41)) at
>>>>>>> 140241380613000> 0
>>>>>>> <ElementaryPredication object (thing_rel (x41)) at 140241380613120> 0
>>>>>>> <ElementaryPredication object (_occur_v_to_rel (e42)) at
>>>>>>> 140241380613240> 1
>>>>>>>
>>>>>>>
>>>>>>> Yours,
>>>>>>> --
>>>>>>> Tuan Anh Le
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Michael Wayne Goodman
>>>>>> Ph.D. Candidate, UW Linguistics
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Yours,
>>>>> --
>>>>> Tuan Anh Le
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Michael Wayne Goodman
>>>> Ph.D. Candidate, UW Linguistics
>>>>
>>>
>>>
>>>
>>> --
>>> Yours,
>>> --
>>> Tuan Anh Le
>>>
>>
>>
>>
>> --
>> Michael Wayne Goodman
>> Ph.D. Candidate, UW Linguistics
>>
>
>
>
> --
> Yours,
> --
> Tuan Anh Le
>
--
Michael Wayne Goodman
Ph.D. Candidate, UW Linguistics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20170728/f14b200b/attachment-0001.html>
More information about the developers
mailing list