<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Regarding disallowing '|' in identifiers: 3 votes for, none against. Let's consider this resolved.<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Regarding the nesting of comments: I'm generally against it but I'm happy to accommodate if that's how we decide to go.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Regarding :begin ... :end environments: The LKB doesn't exactly "ignore" them, it simply cannot parse them (with or without colons; the 'lkb/script' files just don't load the TDL files containing these blocks). I don't think it's necessary to change anything here. LKB support for this portion of TDL could potentially reduce duplication in our configuration files and it might be nice to allow :include elsewhere in the grammars, but maybe the gain is too small for the work required to get it working? Regarding the initial colons on these things, I think the variant *with* colons (e.g., ':begin', not 'begin') is better because then they cannot be mistaken for regular identifiers. As for the :status values, though, we could open up these values to include any identifier, but the values have consequences in the syntax; e.g., blocks with ':status lex-rule' can include %letter-sets, but blocks with other status values cannot.<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Regarding """"" string """: your intuition is correct for "pythonic triple-quoted strings". John was correct that the syntax description on the TdlRfc wiki was ambiguous as to whether this was a single docstring or an empty regular string (DQString) followed by a DocString. I argued that in context it was not ambiguous, although I wasn't clear that I meant: given an ideal and context-sensitive TDL parser there is only one valid interpretation of the sequence in an appropriate context because the other interpretations would be ruled out higher up in the syntax. To be complete, I should have included other contexts (shown below). Note that when inside a DocString, the first sequence of 3 unescaped "
characters always terminates the DocString, so five " characters at the end of a DocString always ends with a DQString:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (1a) a-type := """"" a docstring """. ; DQString DocString</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (1b) a-type := """"" a docstring """ x. ; DocString TypeName</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (1c) a-type := x """"" a docstring """. ; TypeName DocString<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (2a) a-type := """ a docstring """"". ; DocString DQString</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (2b) a-type := """ a docstring """"" x. ; Invalid (needs '&' before 'x')</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (2c) a-type := x """ a docstring """"". ; Invalid (needs '&' after 'x')</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (3a) a-type := """"" a docstring """"". ; DocString DQString<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (3b) a-type := """"" a docstring """"" x. ; Invalid (needs '&' before 'x')</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (3c) a-type := x """"" a docstring """"". ; Invalid (needs '&' after 'x')</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">also, type addenda may contain only docstrings, which results in some ambiguity:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (4) a-type :+ """"" a docstring """. ; ambiguous</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (5) a-type :+ """ a docstring """"". ; DocString DQString<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default">I agree with you that we don't want a context-sensitive lexer, so in a previous message I proposed a change to the DQString rule that allows "" as an empty regular string only if the next character is not " (well, minimal context sensitivity). This would change the above such that (1a) becomes invalid (because a type definition must have a supertype), and (4) only contributes a docstring to the type. This change almost exactly addresses your comment that juxtaposed regular strings and docstrings require intervening whitespace, except that it's ok for non-empty regular strings to be immediately juxtaposed:</div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default"><br></div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default"> (6) a-type := "..."""" a docstring """. ; DQString DocString<br></div></div><div><br></div><div><br></div><div><div style="font-family:arial,helvetica,sans-serif" class="gmail_default">I will go ahead and change the rules for Identifier and DQString, as discussed, on the wiki.<br></div><br></div><div><br></div><div class="gmail_quote"><div dir="ltr">On Mon, Oct 15, 2018 at 2:44 PM Stephan Oepen <<a href="mailto:oe@ifi.uio.no">oe@ifi.uio.no</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div dir="auto">i am not yet very well-versed in pythonic triple-quoted strings, but i would have thought the following can also be interpreted as just one string, with two initial quote marks inside?</div></div><div dir="auto"><br></div><div dir="auto">“”””” string “””</div><div dir="auto"><br></div><div dir="auto">while ‘classic’ strings do not require assumptions about surrounding white space, it almost seems as if triple-quoted strings need separating white space at least if juxtaposed with standard strings?</div><div dir="auto"><br></div><div dir="auto">cheers, oe</div><div dir="auto"><br></div><div><br><div class="gmail_quote"><div dir="ltr">On Mon, 15 Oct 2018 at 17:05 John Carroll <<a href="mailto:J.A.Carroll@sussex.ac.uk" target="_blank">J.A.Carroll@sussex.ac.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">
Hi Mike,
<div><br>
</div>
<div>I didn't reply regarding :begin ... :end blocks because I haven't (yet) had a chance to think about them.</div>
<div><br>
</div>
<div>Regarding identifiers, I'm happy with your option (A1) disallow | in identifiers. Could a grammar writer give an opinion on whether this could in future be a problem?</div>
<div><br>
</div>
<div>As you say, the LKB currently reads identifiers via the Lisp 'read' function. This leads to incorrect behaviour since it accepts such horrors as the following (which are all regarded as being valid and equivalent coreferences):</div>
<div><br>
</div>
<div> #a\b\c</div>
<div> #a|bc|</div>
<div> # |Abc|</div>
<div><br>
</div>
<div>To fix this I'm re-implementing the LKB's reading of identifiers.</div>
<div><br>
</div>
<div>I also agree with your option (B2) Disallow coreferences outside of AVMs. I think the only use for a top-level coreference would be to produce a feature structure containing a cycle, which is not a valid FS.</div>
<div><br>
</div>
<div>We're in agreement about the DQString / DocString issue: an implementation could restructure the BNF non-terminals in order to reduce the lookahead. However, here's a related oddity:</div>
<div><br>
</div>
<div>""""" a docstring """</div>
<div>
<div>""" a docstring """""</div>
</div>
<div><br>
</div>
<div>are valid character sequences inside a top level conjunction (they consist of an empty DQString and a DocString, with the intervening Space being empty). So after consuming "", peeking a further " still doesn't disambiguate. Even more weirdly,</div>
<div><br>
</div>
<div>""""""""</div>
<div><br>
</div>
<div>cannot be disambiguated at all - although it's irrelevant which string is which. But anyhow, none of this matters in practice as long as there's no role for a DQString at the top level.</div></div><div style="word-wrap:break-word">
<div><br>
</div>
<div>John</div>
<div><br>
<div>
<blockquote type="cite">
<div>On 9 Oct 2018, at 21:38, <a href="mailto:goodman.m.w@gmail.com" target="_blank">
goodman.m.w@gmail.com</a> wrote:</div>
<br class="m_-6296086008115705813m_9040675845072575408Apple-interchange-newline">
<div>
<div dir="ltr" style="font-family:Courier;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div style="font-family:arial,helvetica,sans-serif">Hi John, replies are inline below...<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr">On Tue, Oct 9, 2018 at 6:24 AM John Carroll <<a href="mailto:J.A.Carroll@sussex.ac.uk" target="_blank">J.A.Carroll@sussex.ac.uk</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div>Hi Mike,<br>
<br>
I've now got an LKB implementation of<span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span><a href="http://moin.delph-in.net/TdlRfc" target="_blank">http://moin.delph-in.net/TdlRfc</a><span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span>.<br>
</div>
</blockquote>
<div> </div>
<div>That's great to hear. Thanks for your work on this. Since you didn't reply to my latest message on this thread but to the one prior, I assume you did not see the updated TdlRfc with :begin ... :end blocks (though I noted that they are not currently
supported by the LKB; also I fear I introduced ambiguity there as well with the InstanceDef rule), and also the note about the nesting of #| ... |# comments. These don't have much bearing on your questions, so I'll ignore these points for now.<br>
</div>
<div></div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div>Perhaps I'm missing some subtleties in the BNF, but it seems to me that there is an unfortunate ambiguity between Coreference and BlockComment: the character | is legal in the identifier part of a coreference,
which means that differentiating between Coreference and BlockComment inside the body of a definition (or more precisely, inside a DocConj) requires either unbounded lookahead or non-deterministic parsing. E.g. if we have read the following at the start of
a line<br>
<br>
[ SYNSEM #|aaaaaaaaaaaaaa<br>
<br>
then we don't know whether we're meant to be reading a coreference or a block comment, and we won't know until we encounter the next whitespace character or the characters |# (whichever comes first). For the moment my code assumes that # is an attempt to start
a block comment if we are at the top level of a definition, otherwise the intention is to start a coreference.<br>
</div>
</blockquote>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"></span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif">Good observation. This ambiguity is indeed unfortunate. In both the current LOGON-provided LKB and ACE (which I use to test), the 2-character pattern #| only begins
a block comment (both give errors otherwise). The | character is not included in the "break-characters" in the Lisp code, but the LKB seems to rely on Lisp's 'read' function (via lkb-read) to parse identifiers (based on Lisp syntax I think) rather than these
break-characters. I tried creating some type names with | at the initial, medial, and final positions, and<span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span><span class="gmail_default" style="font-family:arial,helvetica,sans-serif">ACE had no trouble with any
of these, while the LKB wouldn't parse the first one:</span></span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>
</span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"> |abc := *top*.</span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"> d|ef := *top*.</span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"> ghi| := *top*.<br>
</span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>
</span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif">Now as for a resolution, we could:</span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>
</span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (A1) disallow | in identifiers</span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (A2) disallow | at the start of identifiers</span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (A3) disallow | at the start of coreference identifiers<br>
</span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (A4) mandate a parsing order; if | is encountered after # it must start a comment, otherwise a coreference</span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>
</span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif">(A4) is what I currently use, but I believe it is equivalent with (A3). I'm happy to go with any of these, even (A1), as I surveyed some grammars (ERG, GG, Jacy, SRG,
Hag, and INDRA) and found no instances of | in type names (`grep -P '[ \w]\|[ \w]').<br>
</span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>
</span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span class="gmail_default" style="font-family:arial,helvetica,sans-serif">You also drew my attention to another issue. My definition of TypedConj currently allows
coreferences after the mandatory TypeName, but not before (because they are not allowed on FeatureConj). Should we:</span></span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>
</span></span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (B1) Allow coreferences on FeatureConj (e.g., in FeatureTerm)</span></span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"> (B2) Disallow coreferences outside of AVMs</span></span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br>
</span></span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span class="gmail_default" style="font-family:arial,helvetica,sans-serif">I can't imagine when you'd want a top-level coreference, so (B2) seems like it would
make a tighter syntax, but maybe there's a use case I'm not thinking of.<br>
</span></span></div>
<div><span class="gmail_default" style="font-family:arial,helvetica,sans-serif"></span> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div><br>
There is also an ambiguity between DocString and DQString: at the top level of a definition (DocConj again), the character sequence "" is ambiguous between an empty DQString or the start of a DocString. Dealing with this case correctly requires either 3 characters
of lookahead or non-deterministic parsing. For the moment my code assumes that " is an attempt to start a DocString if we are at the top level of a definition, otherwise the intention is to start a DQString. (This is related to your previous observation that
regular strings don't really appear in top-level conjunctions).<br>
</div>
</blockquote>
<div><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">I think this is a reasonable assumption. I'm using regular expressions for lexing before I parse, so 3 characters of lookahead isn't a big deal. Also, with the appropriate structure
of the parsing functions I think you only need 1 character of lookahead: after consuming "" you only need to peek one character to decide if it's a docstring or an empty string. This "appropriate structure" might be too drastic a change from what we already
have, unless perhaps docstrings were attributes of the Term instead of the TypeDef.<br>
</div>
<br>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div>I hope these assumptions are OK? I've tested my new code with a few grammars, and the only grammar that fails to load is JACY, due to just one single-quoted docstring (on the type adv_adj_head-avm).<br>
</div>
</blockquote>
<div><br>
</div>
<div>
<div style="font-family:arial,helvetica,sans-serif">For Jacy there is in fact already a ticket for this issue:<span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span><a href="https://github.com/delph-in/jacy/issues/47" target="_blank">https://github.com/delph-in/jacy/issues/47</a></div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">I've just now pushed a change to Jacy that moves the docstring to a comment (to be moved to a triple-quoted docstring in the future).<br>
</div>
<br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div>John<br>
<br>
<br>
<div>
<blockquote type="cite">
<div>On 8 Sep 2018, at 19:18,<span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span><a href="mailto:goodman.m.w@gmail.com" target="_blank">goodman.m.w@gmail.com</a><span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span>wrote:</div>
<br class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359Apple-interchange-newline">
<div>
<div dir="ltr">
<div style="font-family:arial,helvetica,sans-serif">Thanks John and Stephan,</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">John, thanks for offering to clean up the LKB's TDL reading, and I'll gladly leave the Lisping to the experts. If you're very concerned about backwards compatibility, then it should be possible
to accommodate both the double-quoted and the triple-double-quoted variants. I don't think there's any meaningful overlap between double-quoted docstrings and regular strings because regular strings don't really appear in top-level conjunctions, and even if
they did the only case it would be ambiguous is if the string was the only term in a type-addendum. But allowing for both double-quoted and triple-double-quoted docstrings to accommodate the few, if any, grammars that made use of them might be more trouble
than it's worth.</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">Rather, I think that Stephan's point about having a grammar's LKB script require a certain version of the LKB makes more sense.<br>
</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">With all these improvements and shared efforts, 2018 (or 2019) will finally be the year of DELPH-IN on the desktop! ;)<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr">On Sat, Sep 8, 2018 at 7:11 AM Stephan Oepen <<a href="mailto:oe@ifi.uio.no" target="_blank">oe@ifi.uio.no</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div>
<div dir="auto">colleagues,</div>
<div dir="auto"><br>
</div>
<div dir="auto">we put a mechanism into the LKB at some point to allow a grammar to require a minimum revision of the software: see near the top of ’lkb/script‘ in the ERG.</div>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">i would suggest making the forthcoming release of the ERG require a modern version of the LKB, i.e. getting the TDL reader code adapted to support the new triple-quoted documentation strings, rebuilding the binaries in LOGON (my job)
and the LinGO distribution (UW), and encouraging other grammar writers to also add a test of lkb-version-after-p() to their ’script‘ files.</div>
<div dir="auto"><br>
</div>
<div dir="auto">come to think of it, in preparing for a new ERG release, dan and i would often go through his accumulated patches to LKB code and consider opportunities for consolidation. likewise for revisions or additions of [incr tsdb()] skeletons.
as a guiding principle, i would suggest it should be possible to exactly re-create the treebanks in each release using checked-in revisions of all the component pieces (data and software) at the time.</div>
<div dir="auto"><br>
</div>
<div dir="auto">best wishes, oe</div>
<div dir="auto"><br>
</div>
<div><br>
<div class="gmail_quote">
<div dir="ltr">On Sat, 8 Sep 2018 at 13:58 John Carroll <<a href="mailto:J.A.Carroll@sussex.ac.uk" target="_blank">J.A.Carroll@sussex.ac.uk</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div>Hi,
<div><br>
</div>
<div>Thanks for trying to fix the LKB.</div>
<div><br>
</div>
<div>I think your TDL clean-ups are a very good idea. The new version of read-tdl-type-comment in patches.lsp will indeed eventually make it into the LKB proper. But I was concerned about not being able to patch existing LKB binaries effectively. When
I referred to backward compatibility, I was thinking about LKB binaries in distributions that may never get updated, e.g. <a href="http://www.cs.upc.edu/~padro/docker-logon.tgz" target="_blank">http://www.cs.upc.edu/~padro/docker-logon.tgz</a><span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span>and Knoppix+LKB
. This might not be too much of a problem in practice except that some LKB error messages are poor or misleading.</div>
<div><br>
</div>
<div>I'll have a go at making a minimal set of changes that could be put in a patch file, and add a more considered reimplementation of TDL reading to my todo list.</div>
</div>
<div>
<div><br>
</div>
<div>John</div>
<div><br>
<div>
<blockquote type="cite">
<div>On 8 Sep 2018, at 00:09,<span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span><a href="mailto:goodman.m.w@gmail.com" target="_blank">goodman.m.w@gmail.com</a><span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span>wrote:</div>
<br class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326Apple-interchange-newline">
<div>
<div dir="ltr">
<div style="font-family:arial,helvetica,sans-serif">Hi again,</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">I spent an hour or two editing patches.lsp to try and make it work, but my lisp writing and debugging knowledge is too limited to figure it out right now. Here's what I tried to do:</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">
<div style="font-family:arial,helvetica,sans-serif">* read-tdl-top-conjunction:</div>
<div style="font-family:arial,helvetica,sans-serif"> - a copy of read-tdl-conjunction, except for the following...<br>
</div>
<div style="font-family:arial,helvetica,sans-serif"> - call read-tdl-type-comment if peek-with-comments returns " before calling read-tdl-defterm</div>
<div style="font-family:arial,helvetica,sans-serif"> - append the pair (docstring . term) to the "constraint" variable instead of just term<br>
</div>
* read-tdl-avm-def:</div>
<div style="font-family:arial,helvetica,sans-serif"> - remove the part about reading parents<br>
</div>
<div style="font-family:arial,helvetica,sans-serif"> - expect a pair (docstring . term) from read-tdl-top-conjunction</div>
<div style="font-family:arial,helvetica,sans-serif"> - append the docstring to the "comment" variable</div>
<div style="font-family:arial,helvetica,sans-serif"> - extract the term as "unif" and proceeds as before</div>
* read-tdl-type-comment:
<div style="font-family:arial,helvetica,sans-serif"> - if it doesn't encounter """, it calls unread-char to put those quotes back on the stream, because it may be a regular "string" or empty "" string</div>
<div style="font-family:arial,helvetica,sans-serif"> - don't print an error if the string doesn't start with """<br>
</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">I only created read-tdl-top-conjunction so that I didn't have to redefine all the other places where read-tdl-conjunction was used. Trying to load the ERG with these changes gives me an "Unexpected
unif" error when it tries to load fundamentals.tdl.<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr">On Fri, Sep 7, 2018 at 11:59 AM<span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span><a href="mailto:goodman.m.w@gmail.com" target="_blank">goodman.m.w@gmail.com</a><span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span><<a href="mailto:goodman.m.w@gmail.com" target="_blank">goodman.m.w@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div style="font-family:arial,helvetica,sans-serif">Thanks for the feedback, John,</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">While I appreciate your arguments and code, I am reluctant to agree with any changes now. The LKB has been a pioneer in allowing docstrings, but I don't think we should revert the work other developers
have put into their processors in the last month, not to mention the hard-earned consensus over the color of this bike shed. Here are my reasons:</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">1. The agreed-upon syntax does not break backward compatibility (except regarding the number of quote characters), it only opens up new places where docstrings may occur (see (3))<br>
</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">2. The lack of support for docstrings outside of the LKB hindered their adoption, so backward compatibility isn't much of an issue given that grammar developers avoided using them (given this,
maybe I should have pushed harder for docstrings immediately after := or :+... oh well).<br>
</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">3. The LKB's implementation that parses supertypes (or "parents" as used in the lisp code) before other terms is only half-baked. It first reads some type names, then looks for a docstring, then
reads other terms, which may include more type names. I proposed making a change to the syntax so that type names must appear before other terms in a top-level conjunction, but the only replies I got addressing this point (from Stephan and Dan) opposed such
a change. Thus, we agreed that type names have no special position in conjunctions. Because of this, saying that the docstring must occur before the AVM means little, because (a) the AVM may appear before a type name, and (b) there may be more than one AVM.
For instance, the LKB (with the ERG's triple-quoted patch) currently accepts these:</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif"> a := b & c """doc""".</div>
<div style="font-family:arial,helvetica,sans-serif"> a := b & """doc""" c.</div>
<div style="font-family:arial,helvetica,sans-serif"> a := b & c & """doc""" [ Q r ].</div>
<div style="font-family:arial,helvetica,sans-serif"> a := b & """doc""" c & [ Q r ].</div>
<div style="font-family:arial,helvetica,sans-serif"> a := b & """doc""" [ Q r ] & c.</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">but not these:</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif"> a := """doc""" b & c.</div>
<div style="font-family:arial,helvetica,sans-serif"> a := """doc""" b & c & [ Q r ].</div>
<div style="font-family:arial,helvetica,sans-serif"> a := b & c & [ Q r ] """doc""".<br>
</div>
<div><br>
</div>
<div>
<div style="font-family:arial,helvetica,sans-serif">Furthermore, it accepts:</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif"> a := b & c & [ Q r ].</div>
<div style="font-family:arial,helvetica,sans-serif"> a := b & [ Q r ] & c.</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">but not:</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif"> a := [ Q r ] & b & c.</div>
<br>
</div>
<div>
<div style="font-family:arial,helvetica,sans-serif">I imagine a grammar developer (who doesn't browse the lisp code) would not find these facts consistent. It should either enforce that all supertypes appear before other terms, or allow them to
mix freely.</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">So, on the one hand, I think that the LKB is currently deficient WRT the above patterns (which are all allowed, according to current consensus). I may take a look at fixing the Lisp code, but
it would take me a while. On the other hand, the LKB merely enforces the conventional layout of TDL definitions, so it is unlikely to cause problems for now.</div>
<div style="font-family:arial,helvetica,sans-serif"><br>
</div>
<div style="font-family:arial,helvetica,sans-serif">Finally, docstrings are desired for more than just the ERG, so the temporary solution in patches.lsp should eventually make it into the LKB proper. For instance, the read-tdl-avm-def and read-tdl-conjunction
functions would need some changes and the read-tdl-type-parents function should probably just be removed.<br>
</div>
<br>
</div>
<div class="gmail_quote">
<div dir="ltr">On Fri, Sep 7, 2018 at 4:58 AM John Carroll <<a href="mailto:J.A.Carroll@sussex.ac.uk" target="_blank">J.A.Carroll@sussex.ac.uk</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div>
<div>Hi,
<div><br>
</div>
<div>I've been looking at TDL reading in the LKB, and (partly for pragmatic reasons) I suggest restricting docstrings to occur only in the position immediately preceding the AVM - or just before the final . terminator if there is no AVM. Here are my
reasons:</div>
<div><br>
</div>
<div>1. The LKB currently only allows docstrings in that position, and changing this while retaining backward compatibility would require an unreasonable amount of patching in a grammar lkb/patches.lsp file</div>
<div>2. This position is analogous to where docstrings are allowed in programming languages / docstring packages</div>
<div><br>
</div>
<div>In the hope that this is acceptable, at least for the time being, I've sent Dan a new version of his patch to change docstrings from double-quoted to triple double-quoted in the LKB. The patch is attached in case other grammar developers want
to pick it up.</div>
<div><br>
</div>
<div>John</div>
<div><br>
<div>
<blockquote type="cite">
<div>On 7 Sep 2018, at 00:29,<span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span><a href="mailto:goodman.m.w@gmail.com" target="_blank">goodman.m.w@gmail.com</a><span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span>wrote:</div>
<br class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_Apple-interchange-newline">
<div>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
Hi all,</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
<br>
</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
There are some remaining issues with TDL that I'd like to clean up. First I will summarize some decisions made (or at least not rejected) in previous email threads:<br>
</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
<br>
</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
1. Supertypes appear before other terms in a conjunction only by convention (not enforced in the syntax)<br>
</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
2. Docstrings are triple-quoted and may appear before any top-level term or before the final . terminator</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
3. Comments may appear in definitions anywhere that spaces can, except within strings/regexes/affixing-patterns</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
<br>
</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
The following changes are things I think people agree with, so I'd like to consider them as decided:<br>
</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
<br>
</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
4. Removal of the :< operator (if accepted as a variant of :=, throw a warning)</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
5. Removal of 'single-quoted-symbols</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
6. Removal of double-quoted "docstrings"<br>
</div>
<div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
7. Removal of non-regex uses of ^ (otherwise any BNF of TDL is necessarily incomplete because the "extended-syntax" use of ^ is open-ended)<br>
</div>
</div>
<div><br>
</div>
<div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
And there's at least one point I don't think we reached a decision on:</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
<br>
</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
8. Instances must have exactly 1 "supertype" (which is really just a type and not a supertype, i.e., it doesn't change the type hierarchy)<br>
</div>
</div>
<div><br>
</div>
<div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
Also:</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
<br>
</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
9. Does anyone know how wild-cards differ from letter-sets? I see HaG has a wild-card and suffix pattern like these:</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
<br>
</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
%(wild-card (?g ui))</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
...<br>
</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
%suffix (!c!v !c!vn) (!v?g !vn)<br>
</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
</div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
My guess is that wild-cards match but are not used in the replacement, which I can imagine is useful if you want the replacement to use the second of two matches but not the first. It makes me wonder why we don't just use regex substitutions for these things.<br>
</div>
</div>
<div><br>
</div>
<div>
<div class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_" style="font-family:arial,helvetica,sans-serif">
If nobody responds about (1)--(7), I'll make sure the syntax description on the TdlRfc wiki reflects those decisions.</div>
<br>
</div>
--<span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span><br>
<div dir="ltr" class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336m_333016454702567529x_gmail_signature">
-Michael Wayne Goodman</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
<div></div>
</div>
<div>
<div></div>
<div><br>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<br>
--<span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span><br>
<div dir="ltr" class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326m_-5771748778099409336gmail_signature">
-Michael Wayne Goodman</div>
</div>
</blockquote>
</div>
<br clear="all">
<br>
--<span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span><br>
<div dir="ltr" class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359m_1017530180130203987m_-480291692734986326gmail_signature">
-Michael Wayne Goodman</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
<br clear="all">
<br>
--<span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span><br>
<div dir="ltr" class="m_-6296086008115705813m_9040675845072575408gmail-m_-7745458955862952359gmail_signature">-Michael Wayne Goodman</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
</div>
<br clear="all">
<br>
--<span class="m_-6296086008115705813m_9040675845072575408Apple-converted-space"> </span><br>
<div dir="ltr" class="m_-6296086008115705813m_9040675845072575408gmail_signature">-Michael Wayne Goodman</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote></div></div>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">-Michael Wayne Goodman</div></div>