<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Thanks for the response, Stephan,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div></div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Dec 17, 2020 at 6:39 PM Stephan Oepen <<a href="mailto:oe@ifi.uio.no">oe@ifi.uio.no</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail_default" style="font-family:arial,helvetica,sans-serif">[...]</span><br> in a nutshell, EDS native serialization is indeed line-oriented, and i<br> am inclined to hold fast on the one-node-per-line convention. i would<br> not want to muddy these waters, since the format has been around since<br> 2002, and there has been some EDS activity beyond DELPH-IN. i know of<br> at least two EDS readers that rely on the presence of line breaks.<br></blockquote><div><br></div><div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default">Ok, sounds good. Then perhaps my previous message may be informative if the maintainer(s) of those two readers ever decide to embrace the convenience of single-line EDS. Other than determining the top of the graph, adapting the readers should be trivial: just treat \n as any other whitespace.<br></div></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> i do see the benefits of a more compact serialization, however, but<br> would recommend you call that something else (say EDSLines), if you<br> decide to implement it in pyDelphin.</blockquote><div><br></div><div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default">It's been implemented for some time now. In fact all codecs have a -lines variant (simplemrs -> simplemrs-lines, dmrx -> dmrx-lines, etc.). E.g., in the case of XML formats, it outputs each item (<mrs> or <dmrs>) on a line and suppresses the root nodes (<mrs-list>, <dmrs-list>).<br></div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail_default" style="font-family:arial,helvetica,sans-serif">[...]</span><br> {_: e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] }<br> {\n e2:_rain_v_1<3:9>[]\n e3:_heavy_a_1<10:42>[ARG1 e2]\n }<br> {: <span class="gmail_default" style="font-family:arial,helvetica,sans-serif"></span>e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] }<br> <br> the above order reflects what i believe would be my personal ranking<br> just now :-). i frequently use underscores for ‘anonymous’ MRS<br> variables, and the first variant feels maybe most natural: there<br> should be a top identifier, but in this case it is missing.</blockquote><div><br></div><div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default">The 'anonymous' node identifier for a fake top is fine and, conveniently, PyDelphin can already read in this variant. The difference is that '_' is a valid identifier in EDS, so it's not actually missing, just unlinked. I think logically an unlinked top is the same as a null top, but this means that PyDelphin may write an EDS that is different (in terms of Python data structures, viz., upon re-reading the serialization) as the source EDS.</div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <span class="gmail_default" style="font-family:arial,helvetica,sans-serif">T</span>he<br> second variant also would seem to maintain compatibility with the<br> native EDS serialization, only introducing an inline encoding of line<br> breaks. </blockquote><div><br></div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default">Inserting a literal '\' and 'n' is awkward and changes the format, and I don't see how it's compatible at all besides having '\' and 'n' in the same location as your preferred newline characters.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">variant #3, on the other hand, i believe would depart from<br> how native serialization deals with missing tops; thus, if you were to<br> opt for this format, it would be even more important to maintain a<br> clear distinction between EDS native serialization and the pyDelphin<br> EDSLines format.<br></blockquote><div><br></div><div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default">If the thing between the first '{' and the first ':' is the top identifier, then if nothing is there the top is null. This is easy to parse and (I thought) easy to understand. As EDS native serialization from PyDelphin has done this for some time, I will continue to read it in, but going forward I will not write it out. As of the latest commit, I just omit the top entirely, which is what your newline-ful variant would do if it were simply newline-less (see the last EDS of my first message). I have written, but have not yet pushed to GitHub, a change that inserts an anonymous '_' top if the top is null (if '_' is already used by some node, I try '_0', then '_1', etc. until I get an unused one).</div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default"><br></div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default">I have also made the following changes (which I think you'll be happy with):</div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default">- The default serialization is now indented with newlines (and this is true of all codecs); use eds-lines to get the single-line variant<br></div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default">- Conversion from MRS now uses predicate modification by default</div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default">- Blank lines are inserted between indented EDSs (not sure if your readers actually require this)<br></div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> i hope the above makes sense to you? oe<br> <br> <br> On Wed, Dec 16, 2020 at 10:41 AM <a href="mailto:goodman.m.w@gmail.com" target="_blank">goodman.m.w@gmail.com</a><br> <<a href="mailto:goodman.m.w@gmail.com" target="_blank">goodman.m.w@gmail.com</a>> wrote:<br> ><br> > Hello developers,<br> ><br> > It's been a while but I'm returning to a discussion we were having about serializing EDS in the native format when there is no TOP and when there's no INDEX to backoff to. Stephan suggested that EDS is a line-based format (i.e., line breaks are required), while I would like to continue to support single-line EDS in PyDelphin. I think the last word on the subject from Stephan, at least on this list, was mid-September (<a href="http://lists.delph-in.net/archives/developers/2020/003140.html" rel="noreferrer" target="_blank">http://lists.delph-in.net/archives/developers/2020/003140.html</a>), where he said he'd continue discussion on another thread, which presumably meant the thread from late August (<a href="http://lists.delph-in.net/archives/developers/2020/003127.html" rel="noreferrer" target="_blank">http://lists.delph-in.net/archives/developers/2020/003127.html</a>). I don't think the discussion did continue, so I'm starting this thread in case anyone is interested.<br> ><br> > As an example, here's an EDS (without properties) for "It rained."<br> ><br> > {e2:<br> > e2:_rain_v_1<3:9>[]<br> > }<br> ><br> > In PyDelphin, when an EDS has no TOP, I was outputting the first colon anyway, intentionally:<br> ><br> > {:<br> > e2:_rain_v_1<3:9>[]<br> > }<br> ><br> > It's a bit ugly, but it allows me to detect, with 1 token of lookahead, if there's a top or not. If the colon is omitted then it's not clear if "e2:" is the top or the start of the first node. If line breaks are required, we just assume the first line is for the top, whether or not it's there. But for single-line EDS, we need 4 tokens of lookahead to determine if there's a top (assuming the parser treats variables and predicates as the same kinds of tokens):<br> ><br> > {e2: e2:_rain_v_1<3:9>[]}<br> > {e2:_rain_v_1<3:9>[]}<br> ><br> > Here is the parsing algorithm, once we've consumed the first '{':<br> ><br> > 1. If the 1st lookahead token is ':', '(fragmented)' (or another graph status), '}', or '|' (node status), then we know that TOP is missing (the ':' is for PyDelphin's current output)<br> > 2. Otherwise the 1st and 2nd tokens must be a symbol and a colon, and if the 3rd token is a graph or node status, OR if the 4th token is ':', then the 1st token is the TOP<br> > 3. Otherwise TOP must be missing<br> ><br> > I think this covers all the cases but let me know if I've missed anything.<br> ><br> > --<br> > -Michael Wayne Goodman<br> </blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature">-Michael Wayne Goodman</div></div>