[developers] Regarding to German sentence analysis.

Stephan Oepen oe at ifi.uio.no
Wed Feb 22 22:51:17 CET 2017


dear all,

no need to add VPM rules which unconditionally delete: that will be the
default behavior for any variable properties for which no mapping is
defined.  mike, could you update your addition to RmrsVpm in this light?

however, GG has explicit VPM rules for --PSV and several others that start
in double hyphens, so it would seem berthold actually wants these in the
external interface.

i too was under the impression that the double-hyphen prefix typically
indicates something grammar-internal, but that could just be an
ERG-specific convention.

either way, it would seem sad to formally disallow initial hyphens in MRS
variable properties because they cause problems for some serializations of
some derived representations.  i would rather look for a
backwards-compatible extension of the DMRX and RMRX schemas then, to deal
with the full range of identifiers supported in grammars and native MRSs.

all best, oe


On Wed 22 Feb 2017 at 19:17 Michael Wayne Goodman <goodmami at uw.edu> wrote:

Thanks, Ann. Francis said (offline) something similar about those being
grammar-internal properties. I've updated the RmrsVpm wiki with an example
of how to remove them via a VPM (
http://moin.delph-in.net/RmrsVpm#Corner_Cases). Is there any better place
to document this information?

And I also thought it might affect MRS's XML format as well, but then I
remembered that property names are stored as element text instead of
attribute names, so they don't suffer the same problem:

...
<extrapair><path>--PSV</path><value>non-apsv</value></extrapair>
...

It would affect the RMRX format, though.

On Feb 22, 2017 04:42, "Ann Copestake" <aac10 at cl.cam.ac.uk> wrote:

from memory, the interpretation of the "--" was intended to be that this
was something that should not appear in the external MRS

in any case, it wouldn't be a DMRS issue, as such, since presumably it
could apply to any of the MRS XML formats

All best,

Ann

On 21/02/17 20:21, Michael Wayne Goodman wrote:

Ann:
 Does the LKB do anything special regarding properties like --PSV when
reading/writing DMRX? They aren't ill-formed in the SimpleMRS format, so
I'm wondering if PyDelphin should attempt to do anything special for these.

Megha:
  I just thought of another alternative. You can serialize to the DMRS-JSON
format instead of DMRX. JSON doesn't have the same attribute name
constraints as XML. The process is slightly different. Here is
MRS->DMRS-JSON conversion:

    import json
    from delphin.mrs import simplemrs
    from delphin.mrs.xmrs import Dmrs
    print(
        json.dumps(
            Dmrs.from_xmrs(simplemrs.load_one(source)).to_dict()
        )
    )

(the Dmrs.from_xmrs(...) bit is just for Python2 compatibility. In Python3,
you can just do: Dmrs.to_dict(simplemrs.loads_one(source)))

DMRS-JSON -> MRS conversion is similar:

    ...
    print(
        simplemrs.dumps_one(
            Dmrs.from_dict(json.load(source))
        )
    )

These methods require PyDelphin v0.6.0 (the latest release).

On Tue, Feb 21, 2017 at 11:35 AM, Michael Wayne Goodman <goodmami at uw.edu>
wrote:

Hi Megha,

(I've re-CC'd the developers list so they can benefit or contribute; please
include them in follow-up replies)

Thanks for clarifying.

When you do MRS -> DMRS conversion in your script, it is essentially this:

    print(dmrx.dumps(simplemrs.load(source)))

This loads the simplemrs-encoded source (e.g. a file or sys.stdin; or use
simplemrs.loads() for a string argument) into the internal *MRS
representation, then the dmrx codec serializes the internal representation
to DMRX. Doing DMRS -> MRS conversion is the same, but reversed:

    print(simplemrs.dumps(dmrx.load(source)))

(More technically, the dmrx and simplemrs codecs decode the text
streams/strings and instantiate the Dmrs() and Mrs() classes, respectively,
in the delphin.mrs.xmrs module. It is these classes (and not the codecs
themselves) that do the actual conversion into the internal format.)

However, there is a problem with the GG grammar and DMRS. The variable
properties prefixed by "--" (e.g. "--PSV") cause errors when loading a
DMRS. This is because the hyphen is not a valid initial character in an XML
attribute name (https://www.w3.org/TR/REC-xml/#NT-NameStartChar). It is
Python's XML parser, and not PyDelphin, that is failing to load the DMRX
instance. I suggest doing one of the following:

 1. Change the attribute names in the GG grammar

 2. In your conversion script, find and replace these attributes on the MRS
before converting to DMRS, and change them back in DMRS->MRS conversion.
You may use underscores (e.g. "__PSV") as the initial character, according
to the XML spec.

Does this help?

On Tue, Feb 21, 2017 at 1:13 AM, megha jain <jain11megha at gmail.com> wrote:

Hello Michael.

I know usage of Pydelphin so able to implement via this.

I want to know which python code is being used by you to convert German
DMRS into German MRS again?

So that this MRS can be given ACE to generate corresponding German sentence.

I am able to process : German sentence => MRS
                                 MRS => DMRS
                                 DMRS => MRS (that is my concern)

EXAMPLE :-(A.)  INPUT : Abrams bellte sehr leise.

(B.) When I gave above one sentence to ACE , It generated following MRS :-
(command used : ./ace -g ggp.dat -1Tf input_file.txt)

Following file is attached below.

(C) I gave this MRS as an input to mrs_to_dmrs-pp.py pyhon code and
corresponding DMRS generated.
Following file is attached below.

(D.) After this I want to convert corresponding DMRS into MRS . Which
python code comes in use for this approach?


Hopefully I am able to make you understand what is my concern.

Thank You.




-- 
Michael Wayne Goodman
Ph.D. Candidate, UW Linguistics




-- 
Michael Wayne Goodman
Ph.D. Candidate, UW Linguistics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20170222/72ad17d1/attachment-0001.html>


More information about the developers mailing list