[developers] Regarding to German sentence analysis.

Michael Wayne Goodman goodmami at uw.edu
Tue Feb 21 21:21:47 CET 2017


Ann:
 Does the LKB do anything special regarding properties like --PSV when
reading/writing DMRX? They aren't ill-formed in the SimpleMRS format, so
I'm wondering if PyDelphin should attempt to do anything special for these.

Megha:
  I just thought of another alternative. You can serialize to the DMRS-JSON
format instead of DMRX. JSON doesn't have the same attribute name
constraints as XML. The process is slightly different. Here is
MRS->DMRS-JSON conversion:

    import json
    from delphin.mrs import simplemrs
    from delphin.mrs.xmrs import Dmrs
    print(
        json.dumps(
            Dmrs.from_xmrs(simplemrs.load_one(source)).to_dict()
        )
    )

(the Dmrs.from_xmrs(...) bit is just for Python2 compatibility. In Python3,
you can just do: Dmrs.to_dict(simplemrs.loads_one(source)))

DMRS-JSON -> MRS conversion is similar:

    ...
    print(
        simplemrs.dumps_one(
            Dmrs.from_dict(json.load(source))
        )
    )

These methods require PyDelphin v0.6.0 (the latest release).

On Tue, Feb 21, 2017 at 11:35 AM, Michael Wayne Goodman <goodmami at uw.edu>
wrote:

> Hi Megha,
>
> (I've re-CC'd the developers list so they can benefit or contribute;
> please include them in follow-up replies)
>
> Thanks for clarifying.
>
> When you do MRS -> DMRS conversion in your script, it is essentially this:
>
>     print(dmrx.dumps(simplemrs.load(source)))
>
> This loads the simplemrs-encoded source (e.g. a file or sys.stdin; or use
> simplemrs.loads() for a string argument) into the internal *MRS
> representation, then the dmrx codec serializes the internal representation
> to DMRX. Doing DMRS -> MRS conversion is the same, but reversed:
>
>     print(simplemrs.dumps(dmrx.load(source)))
>
> (More technically, the dmrx and simplemrs codecs decode the text
> streams/strings and instantiate the Dmrs() and Mrs() classes, respectively,
> in the delphin.mrs.xmrs module. It is these classes (and not the codecs
> themselves) that do the actual conversion into the internal format.)
>
> However, there is a problem with the GG grammar and DMRS. The variable
> properties prefixed by "--" (e.g. "--PSV") cause errors when loading a
> DMRS. This is because the hyphen is not a valid initial character in an XML
> attribute name (https://www.w3.org/TR/REC-xml/#NT-NameStartChar). It is
> Python's XML parser, and not PyDelphin, that is failing to load the DMRX
> instance. I suggest doing one of the following:
>
>  1. Change the attribute names in the GG grammar
>
>  2. In your conversion script, find and replace these attributes on the
> MRS before converting to DMRS, and change them back in DMRS->MRS
> conversion. You may use underscores (e.g. "__PSV") as the initial
> character, according to the XML spec.
>
> Does this help?
>
> On Tue, Feb 21, 2017 at 1:13 AM, megha jain <jain11megha at gmail.com> wrote:
>
>> Hello Michael.
>>
>> I know usage of Pydelphin so able to implement via this.
>>
>> I want to know which python code is being used by you to convert German
>> DMRS into German MRS again?
>>
>> So that this MRS can be given ACE to generate corresponding German
>> sentence.
>>
>> I am able to process : German sentence => MRS
>>                                  MRS => DMRS
>>                                  DMRS => MRS (that is my concern)
>>
>> EXAMPLE :-(A.)  INPUT : Abrams bellte sehr leise.
>>
>> (B.) When I gave above one sentence to ACE , It generated following MRS :-
>> (command used : ./ace -g ggp.dat -1Tf input_file.txt)
>>
>> Following file is attached below.
>>
>> (C) I gave this MRS as an input to mrs_to_dmrs-pp.py pyhon code and
>> corresponding DMRS generated.
>> Following file is attached below.
>>
>> (D.) After this I want to convert corresponding DMRS into MRS . Which
>> python code comes in use for this approach?
>>
>>
>> Hopefully I am able to make you understand what is my concern.
>>
>> Thank You.
>>
>>
>
>
> --
> Michael Wayne Goodman
> Ph.D. Candidate, UW Linguistics
>



-- 
Michael Wayne Goodman
Ph.D. Candidate, UW Linguistics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20170221/0b8bcb4c/attachment.html>


More information about the developers mailing list