[developers] RESTful ERG parsing

Michael Wayne Goodman goodmami at u.washington.edu
Mon Apr 4 03:35:51 CEST 2016


Thanks! I paste below the current output of the server (in case it changes
for people reading this thread later):

{"top": "h1",
    "index": "e3",
    "relations":
    [{"label": "h4", "predicate": "proper_q", "lnk": {"from": 0, "to": 6},
"roles":
      {"ARG0": "x6", "RSTR": "h5", "BODY": "h7"}},
     {"label": "h8", "predicate": "named", "lnk": {"from": 0, "to": 6},
"roles":
      {"ARG0": "x6", "CARG": "Abrams"}},
     {"label": "h2", "predicate": "_arrive_v_1", "lnk": {"from": 7, "to":
14}, "roles":
      {"ARG0": "e3", "ARG1": "x6"}}],
    "constraints":
    [{"relation": "qeq", "high": "h1", "low": "h2"},
     {"relation": "qeq", "high": "h5", "low": "h8"}],
    "variables":
    {"h2": {}, "h8": {}, "h7": {}, "h5": {},
     "x6": {"type": "x", "PERS": "3", "NUM": "sg", "IND": "+"}, "h4": {},
     "e3": {"type": "e", "SF": "prop", "TENSE": "past", "MOOD":
"indicative", "PROG": "-", "PERF": "-"},
     "h1": {}}}}]}

On Sun, Apr 3, 2016 at 3:49 PM Stephan Oepen <oe at ifi.uio.no> wrote:

> thanks for your thought regarding object references and the suggestion of
> separating the full object representation of variables from their
> occurrences throughout the MRS.  i adapted your proposal (and like this
> version much better), though ended up with a full ‘variables’ property,
> containing one entry for all MRS variable (including handles).  for one,
> even though arguably redundant, i like making the variable type explicit
> (rather than carving into stone our assumptions about variable naming
> conventions);
>

A more generic "variables" object seems like a good idea. Just a small
thing: mixing variable properties and other data ("type") has the
possibility for collisions (i.e. a variable property named "type"). We
could either put the variable properties in a sub-object, or say something
like "variable property keys are always uppercase, other keys are lower or
mixed case".


> but more importantly, this way the distinction between variable vs.
> constant values is explicit in the MRS, i.e. one need not specify the
> inventory of constant-valued roles (e.g. CARG in the ERG) to
> programmatically tell the difference between, say, "Abrams" and "h42".
>

Do you mean that the absence of an argument value in the "variables" object
indicates it's a constant value? That seems handy. I'm definitely in favor
of reducing reliance on grammar-specific configuration files.

Also, I notice that "hcons" became "constraints" in the latest version,
which means that ICONS go in the same list as HCONS (which is not
necessarily problematic)?


> > What I was withholding was that I tried requesting XML
> > and got JSON instead of an error 406:
>

(I also withheld the comment because the server is still a WIP and I didn't
expect such well-roundedness at this early a stage :) )


> true, i am not currently interpreting incoming HTTP headers, hence
> arguably ended up overly robust to the above request :-).  if i understand
> you correctly, you would vote for an HTTP error code in the above scenario?
>

Yes. It is a request that the server isn't configured to handle, so
returning an error code is a good way of telling the browser that the
request failed. It helps the browser know how to cache the requests.


>   for actual errors during parsing, i have in principle opted for option 4
> from the o'reilly post you pointed to (which is their recommendation for
> anything but genuine HTTP errors), though i have yet to provide distinct
> error codes and human-readable error messages (at present, i merely return
> {\"readings\": -1}, as is the [incr tsdb()] convention).
>

For a significant error, like the parser segfaulting, then maybe a server
error (5xx) is appropriate. For something expected (like running out of max
memory for a long sentence), it should be fine to indicate a successful
transaction (a 2xx status) but have a user-readable error message in the
response.


> —as regards interfacing through pyDelphin, for example, i have started to
> play with a simple batch parsing client (attached below), but feel i am
> running up against my pythonic inexperience already.  for example, i have a
> hard time deciding which python version one should support: seeing as we
> need to be able to include non-ascii characters in the URL submitted for
> parsing, the urllib implementation in 2.7 is known to be inadequate.  how
> would you approach this task?
>

Hmm, I actually don't have much experience *sending* requests from Python,
just *handling* them (for which I recommend Bottle: http://bottlepy.org).
But the Requests package (http://requests.rtfd.org) seems well-liked. It's
not in Python's standard library, but the documentation for urllib even
suggests it. Regarding Python versions, I'd try to first support versions
3.3+, then 2.7 if it's not too much work.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20160404/49400d90/attachment.html>


More information about the developers mailing list