[developers] Regarding to German sentence analysis.

Michael Wayne Goodman goodmami at uw.edu
Wed Feb 22 10:39:38 CET 2017


Hi Megha,

On Tue, Feb 21, 2017 at 10:18 PM, megha jain <jain11megha at gmail.com> wrote:

> Hello.
>
> Thanks for suggestions.
> But may I ask that as we have mrs_to_dmrs-gg.py code for conversion of MRS
> into DMRS; Same any code exists for reverse conversion i.e. DMRS into MRS?
>

I've suggested that you use PyDelphin's built-in conversion utility, and
when you requested the code to do so, I offered snippets of Python code you
could use for writing your own utility. Did neither of these options work
for you?

As mentioned in my last email, the GG grammar uses property names that are
not compatible with XML, so simply converting from the DMRX format will not
work. The following works for me using your provided dmrs.txt file:

    sed 's/--/__/g' dmrs.txt | delphin convert --from dmrx --to simplemrs |
sed 's/__/--/g'

The sed command at the beginning replaces -- (e.g., --psv) with __ (e.g.,
__psv) so the XML is well-formed, then the delphin convert command works as
expected. The sed command on the end sets it back to the original value.

Also, as mentioned earlier, if you use the DMRS-JSON format, this issue
becomes moot. You can replace much of your mrs_to_dmrs-gg.py script with a
pipeline like this:

    ace -g ggp.dat -1T input_file.txt | grep -Pv '^(SENT:|SKIP:|$)' |
delphin convert --to dmrs-json > dmrs.json

(the grep command is to remove non-MRS data from the stdout stream)

DMRS to MRS conversion is then simply:

    delphin convert --from dmrs-json dmrs.json

This short tutorial may be informative:
https://github.com/delph-in/pydelphin/wiki/Command-line-Tutorial#convert

I hope I was able to answer your questions.

Thank You..
>
> On Wed, Feb 22, 2017 at 1:51 AM, Michael Wayne Goodman <goodmami at uw.edu>
> wrote:
>
>> Ann:
>>  Does the LKB do anything special regarding properties like --PSV when
>> reading/writing DMRX? They aren't ill-formed in the SimpleMRS format, so
>> I'm wondering if PyDelphin should attempt to do anything special for these.
>>
>> Megha:
>>   I just thought of another alternative. You can serialize to the
>> DMRS-JSON format instead of DMRX. JSON doesn't have the same attribute name
>> constraints as XML. The process is slightly different. Here is
>> MRS->DMRS-JSON conversion:
>>
>>     import json
>>     from delphin.mrs import simplemrs
>>     from delphin.mrs.xmrs import Dmrs
>>     print(
>>         json.dumps(
>>             Dmrs.from_xmrs(simplemrs.load_one(source)).to_dict()
>>         )
>>     )
>>
>> (the Dmrs.from_xmrs(...) bit is just for Python2 compatibility. In
>> Python3, you can just do: Dmrs.to_dict(simplemrs.loads_one(source)))
>>
>> DMRS-JSON -> MRS conversion is similar:
>>
>>     ...
>>     print(
>>         simplemrs.dumps_one(
>>             Dmrs.from_dict(json.load(source))
>>         )
>>     )
>>
>> These methods require PyDelphin v0.6.0 (the latest release).
>>
>> On Tue, Feb 21, 2017 at 11:35 AM, Michael Wayne Goodman <goodmami at uw.edu>
>> wrote:
>>
>>> Hi Megha,
>>>
>>> (I've re-CC'd the developers list so they can benefit or contribute;
>>> please include them in follow-up replies)
>>>
>>> Thanks for clarifying.
>>>
>>> When you do MRS -> DMRS conversion in your script, it is essentially
>>> this:
>>>
>>>     print(dmrx.dumps(simplemrs.load(source)))
>>>
>>> This loads the simplemrs-encoded source (e.g. a file or sys.stdin; or
>>> use simplemrs.loads() for a string argument) into the internal *MRS
>>> representation, then the dmrx codec serializes the internal representation
>>> to DMRX. Doing DMRS -> MRS conversion is the same, but reversed:
>>>
>>>     print(simplemrs.dumps(dmrx.load(source)))
>>>
>>> (More technically, the dmrx and simplemrs codecs decode the text
>>> streams/strings and instantiate the Dmrs() and Mrs() classes, respectively,
>>> in the delphin.mrs.xmrs module. It is these classes (and not the codecs
>>> themselves) that do the actual conversion into the internal format.)
>>>
>>> However, there is a problem with the GG grammar and DMRS. The variable
>>> properties prefixed by "--" (e.g. "--PSV") cause errors when loading a
>>> DMRS. This is because the hyphen is not a valid initial character in an XML
>>> attribute name (https://www.w3.org/TR/REC-xml/#NT-NameStartChar). It is
>>> Python's XML parser, and not PyDelphin, that is failing to load the DMRX
>>> instance. I suggest doing one of the following:
>>>
>>>  1. Change the attribute names in the GG grammar
>>>
>>>  2. In your conversion script, find and replace these attributes on the
>>> MRS before converting to DMRS, and change them back in DMRS->MRS
>>> conversion. You may use underscores (e.g. "__PSV") as the initial
>>> character, according to the XML spec.
>>>
>>> Does this help?
>>>
>>> On Tue, Feb 21, 2017 at 1:13 AM, megha jain <jain11megha at gmail.com>
>>> wrote:
>>>
>>>> Hello Michael.
>>>>
>>>> I know usage of Pydelphin so able to implement via this.
>>>>
>>>> I want to know which python code is being used by you to convert German
>>>> DMRS into German MRS again?
>>>>
>>>> So that this MRS can be given ACE to generate corresponding German
>>>> sentence.
>>>>
>>>> I am able to process : German sentence => MRS
>>>>                                  MRS => DMRS
>>>>                                  DMRS => MRS (that is my concern)
>>>>
>>>> EXAMPLE :-(A.)  INPUT : Abrams bellte sehr leise.
>>>>
>>>> (B.) When I gave above one sentence to ACE , It generated following MRS
>>>> :-
>>>> (command used : ./ace -g ggp.dat -1Tf input_file.txt)
>>>>
>>>> Following file is attached below.
>>>>
>>>> (C) I gave this MRS as an input to mrs_to_dmrs-pp.py pyhon code and
>>>> corresponding DMRS generated.
>>>> Following file is attached below.
>>>>
>>>> (D.) After this I want to convert corresponding DMRS into MRS . Which
>>>> python code comes in use for this approach?
>>>>
>>>>
>>>> Hopefully I am able to make you understand what is my concern.
>>>>
>>>> Thank You.
>>>>
>>>>
>>>
>>>
>>> --
>>> Michael Wayne Goodman
>>> Ph.D. Candidate, UW Linguistics
>>>
>>
>>
>>
>> --
>> Michael Wayne Goodman
>> Ph.D. Candidate, UW Linguistics
>>
>
>


-- 
Michael Wayne Goodman
Ph.D. Candidate, UW Linguistics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20170222/b06f694e/attachment-0001.html>


More information about the developers mailing list