[developers] More TDL cobwebs

Emily M. Bender ebender at uw.edu
Mon Sep 10 15:43:30 CEST 2018


Just for the record: Knoppix+LKB has been superseded by Ubuntu+LKB and we
do keep it updated:

https://wiki.ling.washington.edu/bin/view.cgi/Main/KnoppixLKB

Emily

On Sat, Sep 8, 2018 at 4:57 AM, John Carroll <J.A.Carroll at sussex.ac.uk>
wrote:

> Hi,
>
> Thanks for trying to fix the LKB.
>
> I think your TDL clean-ups are a very good idea. The new version of
> read-tdl-type-comment in patches.lsp will indeed eventually make it into
> the LKB proper. But I was concerned about not being able to patch existing
> LKB binaries effectively. When I referred to backward compatibility, I was
> thinking about LKB binaries in distributions that may never get updated,
> e.g. http://www.cs.upc.edu/~padro/docker-logon.tgz and Knoppix+LKB . This
> might not be too much of a problem  in practice except that some LKB error
> messages are poor or misleading.
>
> I'll have a go at making a minimal set of changes that could be put in a
> patch file, and add a more considered reimplementation of TDL reading to my
> todo list.
>
> John
>
> On 8 Sep 2018, at 00:09, goodman.m.w at gmail.com wrote:
>
> Hi again,
>
> I spent an hour or two editing patches.lsp to try and make it work, but my
> lisp writing and debugging knowledge is too limited to figure it out right
> now. Here's what I tried to do:
>
> * read-tdl-top-conjunction:
>   - a copy of read-tdl-conjunction, except for the following...
>   - call read-tdl-type-comment if peek-with-comments returns " before
> calling read-tdl-defterm
>   - append the pair (docstring . term) to the "constraint" variable
> instead of just term
> * read-tdl-avm-def:
>   - remove the part about reading parents
>   - expect a pair (docstring . term) from read-tdl-top-conjunction
>   - append the docstring to the "comment" variable
>   - extract the term as "unif" and proceeds as before
> * read-tdl-type-comment:
>   - if it doesn't encounter """, it calls unread-char to put those quotes
> back on the stream, because it may be a regular "string" or empty "" string
>   - don't print an error if the string doesn't start with """
>
> I only created read-tdl-top-conjunction so that I didn't have to redefine
> all the other places where read-tdl-conjunction was used. Trying to load
> the ERG with these changes gives me an "Unexpected unif" error when it
> tries to load fundamentals.tdl.
>
> On Fri, Sep 7, 2018 at 11:59 AM goodman.m.w at gmail.com <
> goodman.m.w at gmail.com> wrote:
>
>> Thanks for the feedback, John,
>>
>> While I appreciate your arguments and code, I am reluctant to agree with
>> any changes now. The LKB has been a pioneer in allowing docstrings, but I
>> don't think we should revert the work other developers have put into their
>> processors in the last month, not to mention the hard-earned consensus over
>> the color of this bike shed. Here are my reasons:
>>
>> 1. The agreed-upon syntax does not break backward compatibility (except
>> regarding the number of quote characters), it only opens up new places
>> where docstrings may occur (see (3))
>>
>> 2. The lack of support for docstrings outside of the LKB hindered their
>> adoption, so backward compatibility isn't much of an issue given that
>> grammar developers avoided using them (given this, maybe I should have
>> pushed harder for docstrings immediately after := or :+... oh well).
>>
>> 3. The LKB's implementation that parses supertypes (or "parents" as used
>> in the lisp code) before other terms is only half-baked. It first reads
>> some type names, then looks for a docstring, then reads other terms, which
>> may include more type names. I proposed making a change to the syntax so
>> that type names must appear before other terms in a top-level conjunction,
>> but the only replies I got addressing this point (from Stephan and Dan)
>> opposed such a change. Thus, we agreed that type names have no special
>> position in conjunctions. Because of this, saying that the docstring must
>> occur before the AVM means little, because (a) the AVM may appear before a
>> type name, and (b) there may be more than one AVM. For instance, the LKB
>> (with the ERG's triple-quoted patch) currently accepts these:
>>
>>     a := b & c """doc""".
>>     a := b & """doc""" c.
>>     a := b & c & """doc""" [ Q r ].
>>     a := b & """doc""" c & [ Q r ].
>>     a := b & """doc""" [ Q r ] & c.
>>
>> but not these:
>>
>>     a := """doc""" b & c.
>>     a := """doc""" b & c & [ Q r ].
>>     a := b & c & [ Q r ] """doc""".
>>
>> Furthermore, it accepts:
>>
>>     a := b & c & [ Q r ].
>>     a := b & [ Q r ] & c.
>>
>> but not:
>>
>>     a := [ Q r ] & b & c.
>>
>> I imagine a grammar developer (who doesn't browse the lisp code) would
>> not find these facts consistent. It should either enforce that all
>> supertypes appear before other terms, or allow them to mix freely.
>>
>> So, on the one hand, I think that the LKB is currently deficient WRT the
>> above patterns (which are all allowed, according to current consensus). I
>> may take a look at fixing the Lisp code, but it would take me a while. On
>> the other hand, the LKB merely enforces the conventional layout of TDL
>> definitions, so it is unlikely to cause problems for now.
>>
>> Finally, docstrings are desired for more than just the ERG, so the
>> temporary solution in patches.lsp should eventually make it into the LKB
>> proper. For instance, the read-tdl-avm-def and read-tdl-conjunction
>> functions would need some changes and the read-tdl-type-parents function
>> should probably just be removed.
>>
>> On Fri, Sep 7, 2018 at 4:58 AM John Carroll <J.A.Carroll at sussex.ac.uk>
>> wrote:
>>
>>> Hi,
>>>
>>> I've been looking at TDL reading in the LKB, and (partly for pragmatic
>>> reasons) I suggest restricting docstrings to occur only in the position
>>> immediately preceding the AVM - or just before the final . terminator if
>>> there is no AVM. Here are my reasons:
>>>
>>> 1. The LKB currently only allows docstrings in that position, and
>>> changing this while retaining backward compatibility would require an
>>> unreasonable amount of patching in a grammar lkb/patches.lsp file
>>> 2. This position is analogous to where docstrings are allowed in
>>> programming languages / docstring packages
>>>
>>> In the hope that this is acceptable, at least for the time being, I've
>>> sent Dan a new version of his patch to change docstrings from double-quoted
>>> to triple double-quoted in the LKB. The patch is attached in case other
>>> grammar developers want to pick it up.
>>>
>>> John
>>>
>>> On 7 Sep 2018, at 00:29, goodman.m.w at gmail.com wrote:
>>>
>>> Hi all,
>>>
>>> There are some remaining issues with TDL that I'd like to clean up.
>>> First I will summarize some decisions made (or at least not rejected) in
>>> previous email threads:
>>>
>>> 1. Supertypes appear before other terms in a conjunction only by
>>> convention (not enforced in the syntax)
>>> 2. Docstrings are triple-quoted and may appear before any top-level term
>>> or before the final . terminator
>>> 3. Comments may appear in definitions anywhere that spaces can, except
>>> within strings/regexes/affixing-patterns
>>>
>>> The following changes are things I think people agree with, so I'd like
>>> to consider them as decided:
>>>
>>> 4. Removal of the :< operator (if accepted as a variant of :=, throw a
>>> warning)
>>> 5. Removal of 'single-quoted-symbols
>>> 6. Removal of double-quoted "docstrings"
>>> 7. Removal of non-regex uses of ^ (otherwise any BNF of TDL is
>>> necessarily incomplete because the "extended-syntax" use of ^ is open-ended)
>>>
>>> And there's at least one point I don't think we reached a decision on:
>>>
>>> 8. Instances must have exactly 1 "supertype" (which is really just a
>>> type and not a supertype, i.e., it doesn't change the type hierarchy)
>>>
>>> Also:
>>>
>>> 9. Does anyone know how wild-cards differ from letter-sets? I see HaG
>>> has a wild-card and suffix pattern like these:
>>>
>>>     %(wild-card (?g ui))
>>>     ...
>>>     %suffix (!c!v !c!vn) (!v?g !vn)
>>> My guess is that wild-cards match but are not used in the replacement,
>>> which I can imagine is useful if you want the replacement to use the second
>>> of two matches but not the first. It makes me wonder why we don't just use
>>> regex substitutions for these things.
>>>
>>> If nobody responds about (1)--(7), I'll make sure the syntax description
>>> on the TdlRfc wiki reflects those decisions.
>>>
>>> --
>>> -Michael Wayne Goodman
>>>
>>>
>>>
>>>
>>
>> --
>> -Michael Wayne Goodman
>>
>
>
> --
> -Michael Wayne Goodman
>
>
>


-- 
Emily M. Bender
Professor, Department of Linguistics
University of Washington
Twitter: @emilymbender
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20180910/9ee06f7e/attachment-0001.html>


More information about the developers mailing list