[developers] More TDL cobwebs
Stephan Oepen
oe at ifi.uio.no
Sat Sep 8 16:10:37 CEST 2018
colleagues,
we put a mechanism into the LKB at some point to allow a grammar to require
a minimum revision of the software: see near the top of ’lkb/script‘ in the
ERG.
i would suggest making the forthcoming release of the ERG require a modern
version of the LKB, i.e. getting the TDL reader code adapted to support the
new triple-quoted documentation strings, rebuilding the binaries in LOGON
(my job) and the LinGO distribution (UW), and encouraging other grammar
writers to also add a test of lkb-version-after-p() to their ’script‘ files.
come to think of it, in preparing for a new ERG release, dan and i would
often go through his accumulated patches to LKB code and consider
opportunities for consolidation. likewise for revisions or additions of
[incr tsdb()] skeletons. as a guiding principle, i would suggest it should
be possible to exactly re-create the treebanks in each release using
checked-in revisions of all the component pieces (data and software) at the
time.
best wishes, oe
On Sat, 8 Sep 2018 at 13:58 John Carroll <J.A.Carroll at sussex.ac.uk> wrote:
> Hi,
>
> Thanks for trying to fix the LKB.
>
> I think your TDL clean-ups are a very good idea. The new version of
> read-tdl-type-comment in patches.lsp will indeed eventually make it into
> the LKB proper. But I was concerned about not being able to patch existing
> LKB binaries effectively. When I referred to backward compatibility, I was
> thinking about LKB binaries in distributions that may never get updated,
> e.g. http://www.cs.upc.edu/~padro/docker-logon.tgz and Knoppix+LKB . This
> might not be too much of a problem in practice except that some LKB error
> messages are poor or misleading.
>
> I'll have a go at making a minimal set of changes that could be put in a
> patch file, and add a more considered reimplementation of TDL reading to my
> todo list.
>
> John
>
> On 8 Sep 2018, at 00:09, goodman.m.w at gmail.com wrote:
>
> Hi again,
>
> I spent an hour or two editing patches.lsp to try and make it work, but my
> lisp writing and debugging knowledge is too limited to figure it out right
> now. Here's what I tried to do:
>
> * read-tdl-top-conjunction:
> - a copy of read-tdl-conjunction, except for the following...
> - call read-tdl-type-comment if peek-with-comments returns " before
> calling read-tdl-defterm
> - append the pair (docstring . term) to the "constraint" variable
> instead of just term
> * read-tdl-avm-def:
> - remove the part about reading parents
> - expect a pair (docstring . term) from read-tdl-top-conjunction
> - append the docstring to the "comment" variable
> - extract the term as "unif" and proceeds as before
> * read-tdl-type-comment:
> - if it doesn't encounter """, it calls unread-char to put those quotes
> back on the stream, because it may be a regular "string" or empty "" string
> - don't print an error if the string doesn't start with """
>
> I only created read-tdl-top-conjunction so that I didn't have to redefine
> all the other places where read-tdl-conjunction was used. Trying to load
> the ERG with these changes gives me an "Unexpected unif" error when it
> tries to load fundamentals.tdl.
>
> On Fri, Sep 7, 2018 at 11:59 AM goodman.m.w at gmail.com <
> goodman.m.w at gmail.com> wrote:
>
>> Thanks for the feedback, John,
>>
>> While I appreciate your arguments and code, I am reluctant to agree with
>> any changes now. The LKB has been a pioneer in allowing docstrings, but I
>> don't think we should revert the work other developers have put into their
>> processors in the last month, not to mention the hard-earned consensus over
>> the color of this bike shed. Here are my reasons:
>>
>> 1. The agreed-upon syntax does not break backward compatibility (except
>> regarding the number of quote characters), it only opens up new places
>> where docstrings may occur (see (3))
>>
>> 2. The lack of support for docstrings outside of the LKB hindered their
>> adoption, so backward compatibility isn't much of an issue given that
>> grammar developers avoided using them (given this, maybe I should have
>> pushed harder for docstrings immediately after := or :+... oh well).
>>
>> 3. The LKB's implementation that parses supertypes (or "parents" as used
>> in the lisp code) before other terms is only half-baked. It first reads
>> some type names, then looks for a docstring, then reads other terms, which
>> may include more type names. I proposed making a change to the syntax so
>> that type names must appear before other terms in a top-level conjunction,
>> but the only replies I got addressing this point (from Stephan and Dan)
>> opposed such a change. Thus, we agreed that type names have no special
>> position in conjunctions. Because of this, saying that the docstring must
>> occur before the AVM means little, because (a) the AVM may appear before a
>> type name, and (b) there may be more than one AVM. For instance, the LKB
>> (with the ERG's triple-quoted patch) currently accepts these:
>>
>> a := b & c """doc""".
>> a := b & """doc""" c.
>> a := b & c & """doc""" [ Q r ].
>> a := b & """doc""" c & [ Q r ].
>> a := b & """doc""" [ Q r ] & c.
>>
>> but not these:
>>
>> a := """doc""" b & c.
>> a := """doc""" b & c & [ Q r ].
>> a := b & c & [ Q r ] """doc""".
>>
>> Furthermore, it accepts:
>>
>> a := b & c & [ Q r ].
>> a := b & [ Q r ] & c.
>>
>> but not:
>>
>> a := [ Q r ] & b & c.
>>
>> I imagine a grammar developer (who doesn't browse the lisp code) would
>> not find these facts consistent. It should either enforce that all
>> supertypes appear before other terms, or allow them to mix freely.
>>
>> So, on the one hand, I think that the LKB is currently deficient WRT the
>> above patterns (which are all allowed, according to current consensus). I
>> may take a look at fixing the Lisp code, but it would take me a while. On
>> the other hand, the LKB merely enforces the conventional layout of TDL
>> definitions, so it is unlikely to cause problems for now.
>>
>> Finally, docstrings are desired for more than just the ERG, so the
>> temporary solution in patches.lsp should eventually make it into the LKB
>> proper. For instance, the read-tdl-avm-def and read-tdl-conjunction
>> functions would need some changes and the read-tdl-type-parents function
>> should probably just be removed.
>>
>> On Fri, Sep 7, 2018 at 4:58 AM John Carroll <J.A.Carroll at sussex.ac.uk>
>> wrote:
>>
>>> Hi,
>>>
>>> I've been looking at TDL reading in the LKB, and (partly for pragmatic
>>> reasons) I suggest restricting docstrings to occur only in the position
>>> immediately preceding the AVM - or just before the final . terminator if
>>> there is no AVM. Here are my reasons:
>>>
>>> 1. The LKB currently only allows docstrings in that position, and
>>> changing this while retaining backward compatibility would require an
>>> unreasonable amount of patching in a grammar lkb/patches.lsp file
>>> 2. This position is analogous to where docstrings are allowed in
>>> programming languages / docstring packages
>>>
>>> In the hope that this is acceptable, at least for the time being, I've
>>> sent Dan a new version of his patch to change docstrings from double-quoted
>>> to triple double-quoted in the LKB. The patch is attached in case other
>>> grammar developers want to pick it up.
>>>
>>> John
>>>
>>> On 7 Sep 2018, at 00:29, goodman.m.w at gmail.com wrote:
>>>
>>> Hi all,
>>>
>>> There are some remaining issues with TDL that I'd like to clean up.
>>> First I will summarize some decisions made (or at least not rejected) in
>>> previous email threads:
>>>
>>> 1. Supertypes appear before other terms in a conjunction only by
>>> convention (not enforced in the syntax)
>>> 2. Docstrings are triple-quoted and may appear before any top-level term
>>> or before the final . terminator
>>> 3. Comments may appear in definitions anywhere that spaces can, except
>>> within strings/regexes/affixing-patterns
>>>
>>> The following changes are things I think people agree with, so I'd like
>>> to consider them as decided:
>>>
>>> 4. Removal of the :< operator (if accepted as a variant of :=, throw a
>>> warning)
>>> 5. Removal of 'single-quoted-symbols
>>> 6. Removal of double-quoted "docstrings"
>>> 7. Removal of non-regex uses of ^ (otherwise any BNF of TDL is
>>> necessarily incomplete because the "extended-syntax" use of ^ is open-ended)
>>>
>>> And there's at least one point I don't think we reached a decision on:
>>>
>>> 8. Instances must have exactly 1 "supertype" (which is really just a
>>> type and not a supertype, i.e., it doesn't change the type hierarchy)
>>>
>>> Also:
>>>
>>> 9. Does anyone know how wild-cards differ from letter-sets? I see HaG
>>> has a wild-card and suffix pattern like these:
>>>
>>> %(wild-card (?g ui))
>>> ...
>>> %suffix (!c!v !c!vn) (!v?g !vn)
>>> My guess is that wild-cards match but are not used in the replacement,
>>> which I can imagine is useful if you want the replacement to use the second
>>> of two matches but not the first. It makes me wonder why we don't just use
>>> regex substitutions for these things.
>>>
>>> If nobody responds about (1)--(7), I'll make sure the syntax description
>>> on the TdlRfc wiki reflects those decisions.
>>>
>>> --
>>> -Michael Wayne Goodman
>>>
>>>
>>>
>>>
>>
>> --
>> -Michael Wayne Goodman
>>
>
>
> --
> -Michael Wayne Goodman
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20180908/452c2f1f/attachment-0001.html>
More information about the developers
mailing list