[developers] Regarding to German sentence analysis.

Michael Wayne Goodman goodmami at uw.edu
Wed Feb 22 23:21:37 CET 2017


Thanks for explaining, Stephan.

On Wed, Feb 22, 2017 at 1:51 PM, Stephan Oepen <oe at ifi.uio.no> wrote:

> dear all,
>
> no need to add VPM rules which unconditionally delete: that will be the
> default behavior for any variable properties for which no mapping is
> defined.  mike, could you update your addition to RmrsVpm in this light?
>

Done. There was already a statement about unmapped properties being
suppressed, so I made it more prominent.


> however, GG has explicit VPM rules for --PSV and several others that start
> in double hyphens, so it would seem berthold actually wants these in the
> external interface.
>
> i too was under the impression that the double-hyphen prefix typically
> indicates something grammar-internal, but that could just be an
> ERG-specific convention.
>
> either way, it would seem sad to formally disallow initial hyphens in MRS
> variable properties because they cause problems for some serializations of
> some derived representations.  i would rather look for a
> backwards-compatible extension of the DMRX and RMRX schemas then, to deal
> with the full range of identifiers supported in grammars and native MRSs.
>

Writing backward-compatible schemata would be messy, but we could, e.g.,
allow both DMRX/RMRX- and MRX-style attribute definitions. E.g., instead of:

<sortinfo --psv="non-apsv" cvarsort="e" sf="prop" stat="-" tense="none" />

we would put the problematic attributes inside subelements:

<sortinfo cvarsort="e" sf="prop" stat="-" tense="none">
  <property>
    <name>--psv</name>
    <value>non-apsv</value>
  </property>
</sortinfo>

The latter is to be equivalent to the former, but doesn't violate XML's
well-formedness criteria.

Is this proposal acceptable?

all best, oe
>
>
> On Wed 22 Feb 2017 at 19:17 Michael Wayne Goodman <goodmami at uw.edu> wrote:
>
> Thanks, Ann. Francis said (offline) something similar about those being
> grammar-internal properties. I've updated the RmrsVpm wiki with an example
> of how to remove them via a VPM (http://moin.delph-in.net/
> RmrsVpm#Corner_Cases). Is there any better place to document this
> information?
>
> And I also thought it might affect MRS's XML format as well, but then I
> remembered that property names are stored as element text instead of
> attribute names, so they don't suffer the same problem:
>
> ...
> <extrapair><path>--PSV</path><value>non-apsv</value></extrapair>
> ...
>
> It would affect the RMRX format, though.
>
> On Feb 22, 2017 04:42, "Ann Copestake" <aac10 at cl.cam.ac.uk> wrote:
>
> from memory, the interpretation of the "--" was intended to be that this
> was something that should not appear in the external MRS
>
> in any case, it wouldn't be a DMRS issue, as such, since presumably it
> could apply to any of the MRS XML formats
>
> All best,
>
> Ann
>
> On 21/02/17 20:21, Michael Wayne Goodman wrote:
>
> Ann:
>  Does the LKB do anything special regarding properties like --PSV when
> reading/writing DMRX? They aren't ill-formed in the SimpleMRS format, so
> I'm wondering if PyDelphin should attempt to do anything special for these.
>
> Megha:
>   I just thought of another alternative. You can serialize to the
> DMRS-JSON format instead of DMRX. JSON doesn't have the same attribute name
> constraints as XML. The process is slightly different. Here is
> MRS->DMRS-JSON conversion:
>
>     import json
>     from delphin.mrs import simplemrs
>     from delphin.mrs.xmrs import Dmrs
>     print(
>         json.dumps(
>             Dmrs.from_xmrs(simplemrs.load_one(source)).to_dict()
>         )
>     )
>
> (the Dmrs.from_xmrs(...) bit is just for Python2 compatibility. In
> Python3, you can just do: Dmrs.to_dict(simplemrs.loads_one(source)))
>
> DMRS-JSON -> MRS conversion is similar:
>
>     ...
>     print(
>         simplemrs.dumps_one(
>             Dmrs.from_dict(json.load(source))
>         )
>     )
>
> These methods require PyDelphin v0.6.0 (the latest release).
>
> On Tue, Feb 21, 2017 at 11:35 AM, Michael Wayne Goodman <goodmami at uw.edu>
> wrote:
>
> Hi Megha,
>
> (I've re-CC'd the developers list so they can benefit or contribute;
> please include them in follow-up replies)
>
> Thanks for clarifying.
>
> When you do MRS -> DMRS conversion in your script, it is essentially this:
>
>     print(dmrx.dumps(simplemrs.load(source)))
>
> This loads the simplemrs-encoded source (e.g. a file or sys.stdin; or use
> simplemrs.loads() for a string argument) into the internal *MRS
> representation, then the dmrx codec serializes the internal representation
> to DMRX. Doing DMRS -> MRS conversion is the same, but reversed:
>
>     print(simplemrs.dumps(dmrx.load(source)))
>
> (More technically, the dmrx and simplemrs codecs decode the text
> streams/strings and instantiate the Dmrs() and Mrs() classes, respectively,
> in the delphin.mrs.xmrs module. It is these classes (and not the codecs
> themselves) that do the actual conversion into the internal format.)
>
> However, there is a problem with the GG grammar and DMRS. The variable
> properties prefixed by "--" (e.g. "--PSV") cause errors when loading a
> DMRS. This is because the hyphen is not a valid initial character in an XML
> attribute name (https://www.w3.org/TR/REC-xml/#NT-NameStartChar). It is
> Python's XML parser, and not PyDelphin, that is failing to load the DMRX
> instance. I suggest doing one of the following:
>
>  1. Change the attribute names in the GG grammar
>
>  2. In your conversion script, find and replace these attributes on the
> MRS before converting to DMRS, and change them back in DMRS->MRS
> conversion. You may use underscores (e.g. "__PSV") as the initial
> character, according to the XML spec.
>
> Does this help?
>
> On Tue, Feb 21, 2017 at 1:13 AM, megha jain <jain11megha at gmail.com> wrote:
>
> Hello Michael.
>
> I know usage of Pydelphin so able to implement via this.
>
> I want to know which python code is being used by you to convert German
> DMRS into German MRS again?
>
> So that this MRS can be given ACE to generate corresponding German
> sentence.
>
> I am able to process : German sentence => MRS
>                                  MRS => DMRS
>                                  DMRS => MRS (that is my concern)
>
> EXAMPLE :-(A.)  INPUT : Abrams bellte sehr leise.
>
> (B.) When I gave above one sentence to ACE , It generated following MRS :-
> (command used : ./ace -g ggp.dat -1Tf input_file.txt)
>
> Following file is attached below.
>
> (C) I gave this MRS as an input to mrs_to_dmrs-pp.py pyhon code and
> corresponding DMRS generated.
> Following file is attached below.
>
> (D.) After this I want to convert corresponding DMRS into MRS . Which
> python code comes in use for this approach?
>
>
> Hopefully I am able to make you understand what is my concern.
>
> Thank You.
>
>
>
>
> --
> Michael Wayne Goodman
> Ph.D. Candidate, UW Linguistics
>
>
>
>
> --
> Michael Wayne Goodman
> Ph.D. Candidate, UW Linguistics
>
>
>


-- 
Michael Wayne Goodman
Ph.D. Candidate, UW Linguistics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20170222/9ba961e8/attachment-0001.html>


More information about the developers mailing list