[developers] Serializing EDS without a top

Stephan Oepen oe at ifi.uio.no
Thu Dec 17 11:39:46 CET 2020


hi mike,

yes, i am sorry i now see i never returned to the original thread i
had in mind on M$ GitHub!

in a nutshell, EDS native serialization is indeed line-oriented, and i
am inclined to hold fast on the one-node-per-line convention.  i would
not want to muddy these waters, since the format has been around since
2002, and there has been some EDS activity beyond DELPH-IN.  i know of
at least two EDS readers that rely on the presence of line breaks.

i do see the benefits of a more compact serialization, however, but
would recommend you call that something else (say EDSLines), if you
decide to implement it in pyDelphin.  you would then be free to make
up your own rules, where i could for example imagine either one of the
following (assuming a missing top):

{_: e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] }
{\n e2:_rain_v_1<3:9>[]\n e3:_heavy_a_1<10:42>[ARG1 e2]\n }
{: e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] }

the above order reflects what i believe would be my personal ranking
just now :-).  i frequently use underscores for ‘anonymous’ MRS
variables, and the first variant feels maybe most natural: there
should be a top identifier, but in this case it is missing.  the
second variant also would seem to maintain compatibility with the
native EDS serialization, only introducing an inline encoding of line
breaks.  variant #3, on the other hand, i believe would depart from
how native serialization deals with missing tops; thus, if you were to
opt for this format, it would be even more important to maintain a
clear distinction between EDS native serialization and the pyDelphin
EDSLines format.

i hope the above makes sense to you?  oe


On Wed, Dec 16, 2020 at 10:41 AM goodman.m.w at gmail.com
<goodman.m.w at gmail.com> wrote:
>
> Hello developers,
>
> It's been a while but I'm returning to a discussion we were having about serializing EDS in the native format when there is no TOP and when there's no INDEX to backoff to. Stephan suggested that EDS is a line-based format (i.e., line breaks are required), while I would like to continue to support single-line EDS in PyDelphin. I think the last word on the subject from Stephan, at least on this list, was mid-September (http://lists.delph-in.net/archives/developers/2020/003140.html), where he said he'd continue discussion on another thread, which presumably meant the thread from late August (http://lists.delph-in.net/archives/developers/2020/003127.html). I don't think the discussion did continue, so I'm starting this thread in case anyone is interested.
>
> As an example, here's an EDS (without properties) for "It rained."
>
>     {e2:
>      e2:_rain_v_1<3:9>[]
>     }
>
> In PyDelphin, when an EDS has no TOP, I was outputting the first colon anyway, intentionally:
>
>     {:
>      e2:_rain_v_1<3:9>[]
>     }
>
> It's a bit ugly, but it allows me to detect, with 1 token of lookahead, if there's a top or not. If the colon is omitted then it's not clear if "e2:" is the top or the start of the first node. If line breaks are required, we just assume the first line is for the top, whether or not it's there. But for single-line EDS, we need 4 tokens of lookahead to determine if there's a top (assuming the parser treats variables and predicates as the same kinds of tokens):
>
>     {e2: e2:_rain_v_1<3:9>[]}
>     {e2:_rain_v_1<3:9>[]}
>
> Here is the parsing algorithm, once we've consumed the first '{':
>
> 1. If the 1st lookahead token is ':', '(fragmented)' (or another graph status), '}', or '|' (node status), then we know that TOP is missing (the ':' is for PyDelphin's current output)
> 2. Otherwise the 1st and 2nd tokens must be a symbol and a colon, and if the 3rd token is a graph or node status, OR if the 4th token is ':', then the 1st token is the TOP
> 3. Otherwise TOP must be missing
>
> I think this covers all the cases but let me know if I've missed anything.
>
> --
> -Michael Wayne Goodman



More information about the developers mailing list