[developers] RESTful ERG parsing

Stephan Oepen oe at ifi.uio.no
Sat Apr 2 23:08:52 CEST 2016


many thanks for your comments, mike!

i have adapted the EDS output according to your suggestions, i.e.
propertly structured LNK values as JSON too, and consolidated the
rendering of variable properties.

i also by and large followed your example for MRS serialization in
JSON, with the complication that i want to allow variable properties
on arbitrary argument positions, i.e. make these independent of a
particular EP.  for now, i ended up with the following:

MRS(30): (mrs-output-json (extract-mrs (first *parse-record*)) :stream
t :columns 79)
{"top": {"id": "h1", "type": "h"},
 "index": {"id": "e3", "type": "e", "properties":
 {"SF": "prop", "TENSE": "past", "MOOD": "indicative", "PROG": "-",
"PERF": "-"}},
 "rels":
 [{"label": {"id": "h4", "type": "h"}, "predicate": "proper_q", "lnk":
{"from": 0, "to": 3}, "roles":
   {"ARG0": {"id": "x6", "type": "x", "properties": {"PERS": "3",
"NUM": "sg", "IND": "+"}}, "RSTR": {"id": "h5", "type": "h"}, "BODY":
{"id": "h7", "type": "h"}}},
  {"label": {"id": "h8", "type": "h"}, "predicate": "named", "lnk":
{"from": 0, "to": 3}, "roles": {"ARG0": {"id": "x6"}, "CARG": "Kim"}},
  {"label": {"id": "h2", "type": "h"}, "predicate": "_arrive_v_1",
"lnk": {"from": 4, "to": 12}, "roles": {"ARG0": {"id": "e3"}, "ARG1":
{"id": "x6"}}}],
 "hcons":
 [{"relation": "qeq", "high": {"id": "h1"}, "low": {"id": "h2"}},
  {"relation": "qeq", "high": {"id": "h5"}, "low": {"id": "h8"}}]}

this end up less compact than the simple format, in part because all
variables are objects.  however, the full object content is only
printed once (upon the first variable occurrence).

really, what one might want is an explicit re-entrancy, e.g. something like

  { "index": #1={ "id": "x1", "type": "x", ...},
    ... { "ARG0": #1# ... } ... }

but for all i can tell there is no facility like that available in JSON, right?

—do you have any suggestions for refining the above further?  i
debated making the surface vs. abstract predicate distinction explicit
in the JSON serialization, but i currently look at JSON as an
alternative to the simple serialization, specifically for the RESTful
interface, hence i ended up with the above.

cheers, oe


On Tue, Mar 29, 2016 at 1:00 AM, Michael Wayne Goodman
<goodmami at u.washington.edu> wrote:
> Hi Stephan,
>
> On Mon, Mar 28, 2016 at 1:53 PM Stephan Oepen <oe at ifi.uio.no> wrote:
>>
>> dear colleagues,
>>
>> i used part of the easter break to teach myself about modern
>> technologies and are currently in the process of providing a RESTful
>> (programmatic) interface to the on-line ERG demonstrator.  i know of
>> at least one colleague who has been waiting impatiently for this
>> functionality :-).
>>
>> in a nutshell, client software can now obtain parses using the HTTP
>> protocol and URIs providing the input string (and a handful of
>> optional parameters).  for example:
>>
>>   http://erg.delph-in.net/rest/0.9/parse?input=Abrams%20arrived.
>>
>> parsing results will be returned in machine-readable format,
>> serialized as a JSON document.  for a little more background on how to
>> use this new service (including an example client in Python, believe
>> it or not), please see:
>>
>>   http://moin.delph-in.net/ErgApi
>
>
> What a beautiful bike shed :)
>
> BTW, Demophin has an undocumented HTTP API, but it's not RESTful:
>
> $ curl -F 'sentence=Abrams arrived.'
> http://chimpanzee.ling.washington.edu/demophin/erg/parse
>
> I had hoped to change it to follow the REST principles more closely and
> document the API, but I'm happy to know that you've already started that
> effort.
>
>>
>>
>> there is some more work to be done on the interface (see the page
>> above), but i would like to ask for help already at this point:
>>
>> (0) in case you notice anything surprising in the interactive ERG
>> demonstrator, please do not hesitate to let me know!
>
>
> If you're defining a new JSON schema for EDS, then maybe we can do something
> more convenient for, e.g., lnk values. Currently the indices are encoded in
> a string:
>
>     "lnk": "<0:6>"
>
> If we make it a JSON object, then users of the results wouldn't have to
> parse the string later:
>
> "lnk": {"type": "charspan", "cfrom": 0, "cto": 6}
>
> (The "type" could be optional if we define "charspan" as the default, or if
> we pretend that the other types don't exist)
>
>>
>> (1) i still need to provide a serialization of MRSs in JSON; in case
>> anyone has previously tackled this (design) problem, please do get in
>> touch!
>
>
> Not yet, sorry.  One thing that comes to mind is that JSON doesn't have an
> unordered collection aside from objects (hashes), which require keys. So we
> could treat the RELS, HCONS, and ICONS bags as arrays (lists) (but we often
> do this anyway, so I think it's fine to use arrays). Here's a rather direct
> conversion:
>
> { "top": "h0", "index": "e2", "rels": [ {"pred": "proper_q", "lbl": "h3",
> "arg0": "x4"...}...]...}
>
> One thing that isn't obvious is variable properties. They could follow the
> EDS example and put them in the EP object (and similarly be controlled by
> the "properties" parameter in the URL):
>
> { ..., "rels": [ { "pred": "named", ..., "properties": { "PERS": "3"}}, ...
> ], ...}
>
>>
>> (2) i think it might be nice to incorporate RESTful parsing as an
>> option in pyDelphin; mike, could you be interested in collaborating on
>> this?
>
>
> Yes. Whatever API we settle on, I'd like to incorporate that into pyDelphin
> and use it as the basis for Demophin as well.
>
>>
>> finally, i would be curious to hear comments or suggestions for how to
>> use and extend this service (though cannot promise i will have a lot
>> of time to develop this further until another holiday break); please
>> see towards the bottom of the above wiki page for some candidate
>> directions.
>
>
> I've done a couple of REST APIs so far, so I have some suggestions (some are
> rather technical so I'm happy to save those for an off-list discussion).
>
> One thing that might be relevant to others is how we can request other
> formats. I see you have parameters "eds=...", "derivation=...", "mrs=...",
> so presumably we could expand it with others ("rmrs=...", "dmrs=...", etc.)?
>
> Other ideas:
> * what about generation?
> * can set a request header for preprocessing? (e.g. morphological
> segmentation for Jacy or Zhong)
> * If we have already preprocessed, can we specify the Content-Type (e.g.
> Content-Type: application/yy)
>
>> best wishes; god påske!  oe
>>
>



More information about the developers mailing list