<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Wed, Apr 6, 2016 at 12:15 PM Stephan Oepen <<a href="mailto:oe@ifi.uio.no">oe@ifi.uio.no</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">as an afterthought, one final candidate revision: given our reasoning<br>
about lower- vs. upper-case ‘namespaces’, one could apply the same<br>
condensing as i suggested for ‘properties’ at the EP level and drop<br>
the extra ‘arguments’ embedding. that way, the EP structure would<br>
become an object that is a little more parallel again to the TFS-like<br>
rendering in the ‘simple’ serialization. would you support making<br>
this change, before we finalize this part of the protocol for now?<br></blockquote><div><br></div><div>I actually rather like the current state as it closely mirrors the data structure I have in pyDelphin (which makes parsing easy), but it wouldn't be hard to implement the more condensed form. And on a more subjective note, the condensed form feels like "label" should be "LBL" again, since it's structurally closer to the SimpleMRS format, even though it would then be in the arguments' "namespace".</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
i had been looking at this bug report:<br>
<br>
<a href="https://bugs.python.org/issue1712522" rel="noreferrer" target="_blank">https://bugs.python.org/issue1712522</a><br>
<br>
my understanding is that urllib.quote() in 2.7 does not support<br>
unicode strings, whereas the revised version in 3.x does. i have not<br>
tried the work-around of encoding to an UTF-8 byte sequence first;<br>
strictly speaking, i would think percent escaping should happen at the<br>
string level (and arguably should support arbitrary unicode strings,<br>
effectively making urllib an irilib), and the conversion to a byte<br>
sequence for HTTP transport should be effected by urllib.urlopen().<br></blockquote><div><br></div><div>Thanks for the pointer. I tinkered with the urllib/urllib2 modules and did notice this problem. Encoding to UTF-8 does seem to solve the problem in Python2 (since the quote() function expects a byte string, which would have to be encoded for unicode strings). Python3 accepts either bytes or unicode strings.</div><div><br></div><div>The ideal world you described is probably the 3rd-party Requests package:</div><div><br></div><div><div>>>> import requests</div><div><div><div>>>> resp = requests.get('<a href="http://erg.delph-in.net/rest/0.9/parse?input=%E3%81%82">http://erg.delph-in.net/rest/0.9/parse?input=あ</a> is a Japanese character.')</div><div>>>> resp.json()['input']</div><div>'あ is a Japanese character.'</div></div></div></div><div><br></div><div>(notice the あ is returned in the response; i.e. it was encoded in the request AND decoded in the response; furthermore, this works unmodified for both Python 2 and 3)</div><div><br></div><div>But I share your desire for a simple solution that has no dependencies outside of the standard libraries, so I'll see if I can make it work.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
—once you have had a chance to look at RESTful client implementation<br>
yourself, i will be curious to see which solution you adopt!<br></blockquote><div><br></div><div>Python has a nice ImportError that you can catch, and since Python2 doesn't have the urllib.request or urllib.parse sub-packages, I exploit this to write custom pre-encoding code for Python2. It sounds a little hacky, but it's a pretty common pattern for code meant to work with both versions. But given that Python3's quote function takes either bytes or unicode strings, I might not need to do this. More soon.</div><div><br></div><div>Btw, in the current version of the MRS-JSON format, I noticed that handles had no "type", where I expected {"type": "h"}.</div><div><br></div><div>Thanks</div></div></div>