<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Sat, Apr 2, 2016 at 2:09 PM Stephan Oepen <<a href="mailto:oe@ifi.uio.no">oe@ifi.uio.no</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">i also by and large followed your example for MRS serialization in<br>
JSON, with the complication that i want to allow variable properties<br>
on arbitrary argument positions, i.e. make these independent of a<br>
particular EP. for now, i ended up with the following:<br>
<br>
MRS(30): (mrs-output-json (extract-mrs (first *parse-record*)) :stream<br>
t :columns 79)<br>
{"top": {"id": "h1", "type": "h"},<br>
"index": {"id": "e3", "type": "e", "properties":<br>
{"SF": "prop", "TENSE": "past", "MOOD": "indicative", "PROG": "-",<br>
"PERF": "-"}},<br>
"rels":<br>
[{"label": {"id": "h4", "type": "h"}, "predicate": "proper_q", "lnk":<br>
{"from": 0, "to": 3}, "roles":<br>
{"ARG0": {"id": "x6", "type": "x", "properties": {"PERS": "3",<br>
"NUM": "sg", "IND": "+"}}, "RSTR": {"id": "h5", "type": "h"}, "BODY":<br>
{"id": "h7", "type": "h"}}},<br>
{"label": {"id": "h8", "type": "h"}, "predicate": "named", "lnk":<br>
{"from": 0, "to": 3}, "roles": {"ARG0": {"id": "x6"}, "CARG": "Kim"}},<br>
{"label": {"id": "h2", "type": "h"}, "predicate": "_arrive_v_1",<br>
"lnk": {"from": 4, "to": 12}, "roles": {"ARG0": {"id": "e3"}, "ARG1":<br>
{"id": "x6"}}}],<br>
"hcons":<br>
[{"relation": "qeq", "high": {"id": "h1"}, "low": {"id": "h2"}},<br>
{"relation": "qeq", "high": {"id": "h5"}, "low": {"id": "h8"}}]}<br>
<br>
this end up less compact than the simple format, in part because all<br>
variables are objects. however, the full object content is only<br>
printed once (upon the first variable occurrence).<br></blockquote><div><br></div><div>(aside: I've found that JSON is often not compact; even less than XML at times, and especially when printed with line-breaks and indentation; but if HTTP responses are compressed there's hardly any difference to compressed XML. But JSON is more readable and more convenient if you're consuming the JSON with Javascript (or even other languages like Python))</div><div><br></div><div>Regarding properties not just on EPs: I agree; I was forgetting that MRS can have variables for dropped arguments which aren't ARG0s of some EP but may have properties (e.g. through agreement or something).</div><div><br></div><div>Regarding variables-as-objects; I recognize they have some internal structure (variable-sort, variable-id, and assigned properties), but it's a headache for serialization. When writing, do you always write the full object, or use some reduced form (e.g., just the "id") for all but the first occurrence? When reading, if two objects with the same ID both have properties, do you merge them, or use the first/second/etc., or throw an error? For these reasons, partly, I would prefer having simple strings for variables and a separate hash of variable-to-properties. E.g.:</div><div><br></div><div>{ "top": "h0", "index": "e2", ..., "properties": { "e2": { "TENSE": "past", ...}}}</div><div><br></div><div>For the variable-sort, I'd use a simple regex-based function. E.g. (assuming here, and below, that your client language is Javascript):</div><div> var variable_re = /(.*?)(\d+)$/;<br></div><div> function varsort(v) { return v.replace(variable_re, "$1"); }</div><div> varsort("x12"); # returns "x"</div><div><br></div><div>Also, if we don't put the variable properties in every variable object, then you'll have to do some post-processing (after deserializing JSON) in order to resolve those objects (and see below about re-entrancy). E.g. you might ask for:</div><div> mrs.rels[2].roles.ARG0.properties</div><div>but that information is at:</div><div> mrs.index.properties</div><div>Even if we did have the properties objects on every variable occurrence, the two above expressions don't return the same object; just ones with the same contents (hopefully).</div><div><br></div><div>With the separate properties list, you can do:</div><div> mrs.properties[mrs.index]</div><div>or:</div><div> mrs.properties[mrs.rels[2].roles.ARG0]</div><div>and get the same object back.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
really, what one might want is an explicit re-entrancy, e.g. something like<br>
<br>
{ "index": #1={ "id": "x1", "type": "x", ...},<br>
... { "ARG0": #1# ... } ... }<br>
<br>
but for all i can tell there is no facility like that available in JSON, right?<br></blockquote><div><br></div><div>It's not part of the JSON spec. But even XMLs ID and IDREF don't always result in actual-re-entrancies. You'd need an XML reader that honors that information, AND probably the DTD (or other schema) that says which attributes are IDREFs. If you want this in JSON, you'd have to do it yourself---i.e., write your own post-processing transforms after deserializing the JSON into Javascript objects---or use a library that does this for you.</div><div><br></div><div>Also see these:</div><div>* <a href="https://en.wikipedia.org/wiki/JSON#Object_references">https://en.wikipedia.org/wiki/JSON#Object_references</a><br></div><div>* <a href="http://www.jspon.org/">http://www.jspon.org/</a></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
—do you have any suggestions for refining the above further? i<br>
debated making the surface vs. abstract predicate distinction explicit<br>
in the JSON serialization, but i currently look at JSON as an<br>
alternative to the simple serialization, specifically for the RESTful<br>
interface, hence i ended up with the above.<br></blockquote><div><br></div><div>I usually base my naming off the MRS DTD, e.g., "pred" instead of "predicate", "hi" and "lo" in HCONS, etc., but I now see the fuller forms exist in the lisp code, which you may be more used to seeing. As long as it's documented, I don't think it matters either way.</div><div><br></div><div>Regarding surface vs abstract predicates: I don't have strong opinions here. I think the convention (rule?) that surface predicates begin with an underscore seems sufficient. For convenience, we could define a simple function (like the varsort() one above) to return something based on it's presence/absence (e.g. `predicateType("_arrive_v_1") == "surface"`). But similar to my thoughts on variables, I think making the value of the "predicate" key an object (e.g. `"predicate": {"value": "_arrive_v_1", "type": "surface"}`) would cause more problems than it would help.</div><div><br></div><div>Oh and BTW, Emily said (offline) that I shouldn't offer to take technical discussions offline. What I was withholding was that I tried requesting XML and got JSON instead of an error 406:</div><div><br></div><div><div>$ curl -v -H "Accept: application/xml" <a href="http://erg.delph-in.net/rest/0.9/parse?input=Abrams%20arrived">http://erg.delph-in.net/rest/0.9/parse?input=Abrams%20arrived</a></div><div>[...]</div><div>< HTTP/1.1 200 OK</div><div>[...]</div><div>{"input": "Abrams arrived",<br></div></div><div>[...]</div><div><br></div><div>There are differing opinions on how to treat bad requests (<a href="http://archive.oreilly.com/pub/post/restful_error_handling.html">http://archive.oreilly.com/pub/post/restful_error_handling.html</a>), but I think that returning descriptive status codes is a good way to help the client know what to present the user.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
cheers, oe<br>
<br>
<br>
On Tue, Mar 29, 2016 at 1:00 AM, Michael Wayne Goodman<br>
<<a href="mailto:goodmami@u.washington.edu" target="_blank">goodmami@u.washington.edu</a>> wrote:<br>
> Hi Stephan,<br>
><br>
> On Mon, Mar 28, 2016 at 1:53 PM Stephan Oepen <<a href="mailto:oe@ifi.uio.no" target="_blank">oe@ifi.uio.no</a>> wrote:<br>
>><br>
>> dear colleagues,<br>
>><br>
>> i used part of the easter break to teach myself about modern<br>
>> technologies and are currently in the process of providing a RESTful<br>
>> (programmatic) interface to the on-line ERG demonstrator. i know of<br>
>> at least one colleague who has been waiting impatiently for this<br>
>> functionality :-).<br>
>><br>
>> in a nutshell, client software can now obtain parses using the HTTP<br>
>> protocol and URIs providing the input string (and a handful of<br>
>> optional parameters). for example:<br>
>><br>
>> <a href="http://erg.delph-in.net/rest/0.9/parse?input=Abrams%20arrived" rel="noreferrer" target="_blank">http://erg.delph-in.net/rest/0.9/parse?input=Abrams%20arrived</a>.<br>
>><br>
>> parsing results will be returned in machine-readable format,<br>
>> serialized as a JSON document. for a little more background on how to<br>
>> use this new service (including an example client in Python, believe<br>
>> it or not), please see:<br>
>><br>
>> <a href="http://moin.delph-in.net/ErgApi" rel="noreferrer" target="_blank">http://moin.delph-in.net/ErgApi</a><br>
><br>
><br>
> What a beautiful bike shed :)<br>
><br>
> BTW, Demophin has an undocumented HTTP API, but it's not RESTful:<br>
><br>
> $ curl -F 'sentence=Abrams arrived.'<br>
> <a href="http://chimpanzee.ling.washington.edu/demophin/erg/parse" rel="noreferrer" target="_blank">http://chimpanzee.ling.washington.edu/demophin/erg/parse</a><br>
><br>
> I had hoped to change it to follow the REST principles more closely and<br>
> document the API, but I'm happy to know that you've already started that<br>
> effort.<br>
><br>
>><br>
>><br>
>> there is some more work to be done on the interface (see the page<br>
>> above), but i would like to ask for help already at this point:<br>
>><br>
>> (0) in case you notice anything surprising in the interactive ERG<br>
>> demonstrator, please do not hesitate to let me know!<br>
><br>
><br>
> If you're defining a new JSON schema for EDS, then maybe we can do something<br>
> more convenient for, e.g., lnk values. Currently the indices are encoded in<br>
> a string:<br>
><br>
> "lnk": "<0:6>"<br>
><br>
> If we make it a JSON object, then users of the results wouldn't have to<br>
> parse the string later:<br>
><br>
> "lnk": {"type": "charspan", "cfrom": 0, "cto": 6}<br>
><br>
> (The "type" could be optional if we define "charspan" as the default, or if<br>
> we pretend that the other types don't exist)<br>
><br>
>><br>
>> (1) i still need to provide a serialization of MRSs in JSON; in case<br>
>> anyone has previously tackled this (design) problem, please do get in<br>
>> touch!<br>
><br>
><br>
> Not yet, sorry. One thing that comes to mind is that JSON doesn't have an<br>
> unordered collection aside from objects (hashes), which require keys. So we<br>
> could treat the RELS, HCONS, and ICONS bags as arrays (lists) (but we often<br>
> do this anyway, so I think it's fine to use arrays). Here's a rather direct<br>
> conversion:<br>
><br>
> { "top": "h0", "index": "e2", "rels": [ {"pred": "proper_q", "lbl": "h3",<br>
> "arg0": "x4"...}...]...}<br>
><br>
> One thing that isn't obvious is variable properties. They could follow the<br>
> EDS example and put them in the EP object (and similarly be controlled by<br>
> the "properties" parameter in the URL):<br>
><br>
> { ..., "rels": [ { "pred": "named", ..., "properties": { "PERS": "3"}}, ...<br>
> ], ...}<br>
><br>
>><br>
>> (2) i think it might be nice to incorporate RESTful parsing as an<br>
>> option in pyDelphin; mike, could you be interested in collaborating on<br>
>> this?<br>
><br>
><br>
> Yes. Whatever API we settle on, I'd like to incorporate that into pyDelphin<br>
> and use it as the basis for Demophin as well.<br>
><br>
>><br>
>> finally, i would be curious to hear comments or suggestions for how to<br>
>> use and extend this service (though cannot promise i will have a lot<br>
>> of time to develop this further until another holiday break); please<br>
>> see towards the bottom of the above wiki page for some candidate<br>
>> directions.<br>
><br>
><br>
> I've done a couple of REST APIs so far, so I have some suggestions (some are<br>
> rather technical so I'm happy to save those for an off-list discussion).<br>
><br>
> One thing that might be relevant to others is how we can request other<br>
> formats. I see you have parameters "eds=...", "derivation=...", "mrs=...",<br>
> so presumably we could expand it with others ("rmrs=...", "dmrs=...", etc.)?<br>
><br>
> Other ideas:<br>
> * what about generation?<br>
> * can set a request header for preprocessing? (e.g. morphological<br>
> segmentation for Jacy or Zhong)<br>
> * If we have already preprocessed, can we specify the Content-Type (e.g.<br>
> Content-Type: application/yy)<br>
><br>
>> best wishes; god påske! oe<br>
>><br>
><br>
</blockquote></div></div>