[developers] Serializing EDS without a top

goodman.m.w at gmail.com goodman.m.w at gmail.com
Wed Dec 16 10:38:55 CET 2020


Hello developers,

It's been a while but I'm returning to a discussion we were having about
serializing EDS in the native format when there is no TOP and when there's
no INDEX to backoff to. Stephan suggested that EDS is a line-based format
(i.e., line breaks are required), while I would like to continue to support
single-line EDS in PyDelphin. I think the last word on the subject from
Stephan, at least on this list, was mid-September (
http://lists.delph-in.net/archives/developers/2020/003140.html), where he
said he'd continue discussion on another thread, which presumably meant the
thread from late August (
http://lists.delph-in.net/archives/developers/2020/003127.html). I don't
think the discussion did continue, so I'm starting this thread in case
anyone is interested.

As an example, here's an EDS (without properties) for "It rained."

    {e2:
     e2:_rain_v_1<3:9>[]
    }

In PyDelphin, when an EDS has no TOP, I was outputting the first colon
anyway, intentionally:

    {:
     e2:_rain_v_1<3:9>[]
    }

It's a bit ugly, but it allows me to detect, with 1 token of lookahead, if
there's a top or not. If the colon is omitted then it's not clear if "e2:"
is the top or the start of the first node. If line breaks are required, we
just assume the first line is for the top, whether or not it's there. But
for single-line EDS, we need 4 tokens of lookahead to determine if there's
a top (assuming the parser treats variables and predicates as the same
kinds of tokens):

    {e2: e2:_rain_v_1<3:9>[]}
    {e2:_rain_v_1<3:9>[]}

Here is the parsing algorithm, once we've consumed the first '{':

1. If the 1st lookahead token is ':', '(fragmented)' (or another graph
status), '}', or '|' (node status), then we know that TOP is missing (the
':' is for PyDelphin's current output)
2. Otherwise the 1st and 2nd tokens must be a symbol and a colon, and if
the 3rd token is a graph or node status, OR if the 4th token is ':', then
the 1st token is the TOP
3. Otherwise TOP must be missing

I think this covers all the cases but let me know if I've missed anything.

-- 
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20201216/b59d72c5/attachment.html>


More information about the developers mailing list