[developers] Regarding to German sentence analysis.

Ann Copestake aac10 at cl.cam.ac.uk
Wed Feb 22 21:28:31 CET 2017


Thanks - that seems sensible.  I think that as long as there are 
appropriate search terms in place (e.g., so someone searching for XML 
will find the page), I don't think it matters too much where it is on 
the wiki,

Ann


On 22/02/2017 18:17, Michael Wayne Goodman wrote:
> Thanks, Ann. Francis said (offline) something similar about those 
> being grammar-internal properties. I've updated the RmrsVpm wiki with 
> an example of how to remove them via a VPM 
> (http://moin.delph-in.net/RmrsVpm#Corner_Cases). Is there any better 
> place to document this information?
>
> And I also thought it might affect MRS's XML format as well, but then 
> I remembered that property names are stored as element text instead of 
> attribute names, so they don't suffer the same problem:
>
> ...
> <extrapair><path>--PSV</path><value>non-apsv</value></extrapair>
> ...
>
> It would affect the RMRX format, though.
>
> On Feb 22, 2017 04:42, "Ann Copestake" <aac10 at cl.cam.ac.uk 
> <mailto:aac10 at cl.cam.ac.uk>> wrote:
>
>     from memory, the interpretation of the "--" was intended to be
>     that this was something that should not appear in the external MRS
>
>     in any case, it wouldn't be a DMRS issue, as such, since
>     presumably it could apply to any of the MRS XML formats
>
>     All best,
>
>     Ann
>
>
>     On 21/02/17 20:21, Michael Wayne Goodman wrote:
>>     Ann:
>>      Does the LKB do anything special regarding properties like --PSV
>>     when reading/writing DMRX? They aren't ill-formed in the
>>     SimpleMRS format, so I'm wondering if PyDelphin should attempt to
>>     do anything special for these.
>>
>>     Megha:
>>       I just thought of another alternative. You can serialize to the
>>     DMRS-JSON format instead of DMRX. JSON doesn't have the same
>>     attribute name constraints as XML. The process is slightly
>>     different. Here is MRS->DMRS-JSON conversion:
>>
>>         import json
>>         from delphin.mrs import simplemrs
>>         from delphin.mrs.xmrs import Dmrs
>>         print(
>>             json.dumps(
>>                 Dmrs.from_xmrs(simplemrs.load_one(source)).to_dict()
>>             )
>>         )
>>
>>     (the Dmrs.from_xmrs(...) bit is just for Python2 compatibility.
>>     In Python3, you can just do:
>>     Dmrs.to_dict(simplemrs.loads_one(source)))
>>
>>     DMRS-JSON -> MRS conversion is similar:
>>
>>         ...
>>         print(
>>             simplemrs.dumps_one(
>>                 Dmrs.from_dict(json.load(source))
>>             )
>>         )
>>
>>     These methods require PyDelphin v0.6.0 (the latest release).
>>
>>     On Tue, Feb 21, 2017 at 11:35 AM, Michael Wayne Goodman
>>     <goodmami at uw.edu <mailto:goodmami at uw.edu>> wrote:
>>
>>         Hi Megha,
>>
>>         (I've re-CC'd the developers list so they can benefit or
>>         contribute; please include them in follow-up replies)
>>
>>         Thanks for clarifying.
>>
>>         When you do MRS -> DMRS conversion in your script, it is
>>         essentially this:
>>
>>             print(dmrx.dumps(simplemrs.load(source)))
>>
>>         This loads the simplemrs-encoded source (e.g. a file or
>>         sys.stdin; or use simplemrs.loads() for a string argument)
>>         into the internal *MRS representation, then the dmrx codec
>>         serializes the internal representation to DMRX. Doing DMRS ->
>>         MRS conversion is the same, but reversed:
>>
>>             print(simplemrs.dumps(dmrx.load(source)))
>>
>>         (More technically, the dmrx and simplemrs codecs decode the
>>         text streams/strings and instantiate the Dmrs() and Mrs()
>>         classes, respectively, in the delphin.mrs.xmrs module. It is
>>         these classes (and not the codecs themselves) that do the
>>         actual conversion into the internal format.)
>>
>>         However, there is a problem with the GG grammar and DMRS. The
>>         variable properties prefixed by "--" (e.g. "--PSV") cause
>>         errors when loading a DMRS. This is because the hyphen is not
>>         a valid initial character in an XML attribute name
>>         (https://www.w3.org/TR/REC-xml/#NT-NameStartChar
>>         <https://www.w3.org/TR/REC-xml/#NT-NameStartChar>). It is
>>         Python's XML parser, and not PyDelphin, that is failing to
>>         load the DMRX instance. I suggest doing one of the following:
>>
>>          1. Change the attribute names in the GG grammar
>>
>>          2. In your conversion script, find and replace these
>>         attributes on the MRS before converting to DMRS, and change
>>         them back in DMRS->MRS conversion. You may use underscores
>>         (e.g. "__PSV") as the initial character, according to the XML
>>         spec.
>>
>>         Does this help?
>>
>>         On Tue, Feb 21, 2017 at 1:13 AM, megha jain
>>         <jain11megha at gmail.com <mailto:jain11megha at gmail.com>> wrote:
>>
>>             Hello Michael.
>>
>>             I know usage of Pydelphin so able to implement via this.
>>
>>             I want to know which python code is being used by you to
>>             convert German DMRS into German MRS again?
>>
>>             So that this MRS can be given ACE to generate
>>             corresponding German sentence.
>>
>>             I am able to process : German sentence => MRS
>>             MRS => DMRS
>>             DMRS => MRS (that is my concern)
>>
>>             EXAMPLE :-(A.)  INPUT : Abrams bellte sehr leise.
>>
>>             (B.) When I gave above one sentence to ACE , It generated
>>             following MRS :-
>>             (command used : ./ace -g ggp.dat -1Tf input_file.txt)
>>
>>             Following file is attached below.
>>
>>             (C) I gave this MRS as an input to mrs_to_dmrs-pp.py
>>             pyhon code and corresponding DMRS generated.
>>             Following file is attached below.
>>
>>             (D.) After this I want to convert corresponding DMRS into
>>             MRS . Which python code comes in use for this approach?
>>
>>
>>             Hopefully I am able to make you understand what is my
>>             concern.
>>
>>             Thank You.
>>
>>
>>
>>
>>         -- 
>>         Michael Wayne Goodman
>>         Ph.D. Candidate, UW Linguistics
>>
>>
>>
>>
>>     -- 
>>     Michael Wayne Goodman
>>     Ph.D. Candidate, UW Linguistics
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20170222/ee008f2a/attachment.html>


More information about the developers mailing list