[developers] Questions on the syntax of TDL

Thu Jul 12 01:49:28 CEST 2018

Thank you for the useful information, Guy. I think a lot of it could be
incorporated into the wiki.

Also, in general, I think it would be useful to stratify the quality of TDL
files into 3 levels: (L1) syntactic well-formedness; (L2) structural
validity (i.e., does it unify and compile); and (L3) adherence to
grammar-engineering conventions. I think Copestake 2002 further divides L2
into type-hierarchy validity and constraint validity.

On Wed, Jul 11, 2018 at 2:34 AM, Guy Emerson <gete2 at cam.ac.uk> wrote:

> To answer question 2 on the wiki, instances are formally different from
> types.
>
> After all type definitions have been given, the type hierarchy can be
> defined along with all its constraints (or fail to be defined, if there's a
> conflict somewhere).  An instance is a feature structure that must follow
> the type hierarchy and its constraints.  A type is given a name, and this
> becomes part of the type hierarchy.  An instance is given a name, but this
> is for ease of reference and doesn't affect the hierarchy.
>

Ok, great. This goes well with what I expected. It also prompted me to read
parts of Copestake 2002 a little closer, which provided the info that types
and instances ("entries", rather) must be in separate files, and that there
are separate functions for loading these files (which I was noticing when
reading the lisp code). Since there is no syntactic variation to
distinguish types and entries in TDL, it must be the functions used to load
the files that determines if they will affect the hierarchy or not.

> For example, in defining a type, we are free to combine any supertypes to
> create multiple inheritance in the type hierarchy.  However, in defining an
> instance, we must choose which type it instantiates.  The definition of an
> instance gives its type, not a supertype.  (The type's feature structure
> will subsume the instance's feature structure, but both feature structures
> have the same type.)  Now, in terms of syntax, there are two options:
>
> 1. An instance must be defined as instantiating one specific type (so
> "multiple supertype" syntax is not allowed).
>
> 2. An instance can be defined as instantiating multiple types, in which
> case the TDL processor must perform unification, either finding the glb if
> it exists, or failing.
>
> I'm not sure there's a use case for 2.  It would seem to be clearer to
> directly state the type of the instance, rather than leaving it implicit.
> In principle, option 2 would allow instantiating a glb type that isn't
> explicitly defined... but I don't know if anyone ever has a need for that,
> and if the glb is important, it would probably be clearer to define it
> explicitly.  So I think option 1 is better.
>

I agree that option 1 is clearer and conventional (i.e., passes L1--L3
defined above), but I don't find evidence in the code that multiple
supertypes is disallowed (though I'm not great at reading lisp). I just
noticed that Copestake 2002 claims that it's ok as long as it doesn't
change the hierarchy (i.e., doesn't require the creation of a glb type); in
other words, if there's already a type defined that inherits from all
supertypes. But in this case there's no reason not to just use the defined
subtype. So I'm happy to just say that specifying multiple supertypes on
instances is formally disallowed.

>
> For questions 3, 5, and 6:
>
> Requiring types to come first seems sensible.  I think everyone already
> does this, anyway?
>

It's conventional, and may be required for docstrings, but it doesn't seem
to be invalid to have non-type terms first (passes L1--L2).

> Fixing the position of docstrings seems sensible.  Is there any variation
> in current code?
>

Perhaps due to the mixed support for docstrings across the LKB, ACE, and
PET (I haven't tested agree), I actually don't see them being used much at
all, which is unfortunate. I don't see (`grep` doesn't, I mean) any in the
ERG, Jacy only has one, and I think we got rid of them in the Matrix. Then
again, there's more uptake of LTDB-style comments, which fill a similar
role.

If the position of docstrings isn't fixed, I don't know what's preventing
multiple docstrings from appearing on the same type (`t := st1 & "doc1" st2
& "doc2" & [ ... ].`). If there's more than one, do we concatenate? Use the
first/last? On a related note, I think if a type addendum uses a docstring,
the logical behavior is to replace the one on the original definition (or
previous addendum).

> Allowing extra whitespace and comments seems sensible.
>

My syntax description currently allows comments in many places, although I
haven't compared to see if it lines up with the LKB's `peek-with-comments`
function (which I think behaves similar to my `Spacing` production rule).
Copestake 2002 says comments are only allowed outside type descriptions,
which I think was true back when I took Emily's grammar-engineering course,
but may not be now.

2018-07-11 2:01 GMT+01:00 Michael Wayne Goodman <goodmami at uw.edu>:
>
>> I attempted to define a BNF-like description of TDL syntax on the wiki:
>> http://moin.delph-in.net/TdlRfc
>> I tried to follow the partial BNF in the LKB source and often referred to
>> the lisp code itself in order to fill out the rest of the description.
>>
>> My 3 questions above are concisely repeated at the bottom of the wiki
>> along with some others.
>>
>> I welcome corrections and discussion (here or on the wiki) from any TDL
>> nerds or authorities (especially if you've written a TDL parser).
>>
>> On Mon, Jul 9, 2018 at 12:49 PM, Michael Wayne Goodman <goodmami at uw.edu>
>> wrote:
>>
>>> Hi developers,
>>>
>>> I'm taking a closer look at the syntax of TDL files and the situation is
>>> a bit of a mess. Can anyone help me clarify some things? (I'll restrict
>>> myself to 3 questions for now)
>>>
>>> The Copestake 2002 reference (Implementing TFS Grammars) has a BNF for
>>> TDL, but it's a bit out of date and, according to comments in the LKB
>>> source code, incorrect in parts. The LKB source comments are scattered,
>>> incomplete, inconsistent, and also a bit outdated. There is not much on the
>>> wiki. There is some discussion in the mailing list archives (much from
>>> before my time in DELPH-IN), but it's not clear how current those
>>> descriptions are.
>>>
>>> Q1: Are supertypes special in a definition?
>>>
>>> The BNF (in the LKB source) says this:
>>>
>>>     Type-def -> Type { Avm-def | Subtype-def} . |
>>>                          Type { Avm-def | Subtype-def}.
>>>     Avm-def -> := Conjunction | Comment Conjunction
>>>     Conjunction -> Term { & Term } *
>>>     Term -> Type | Feature-term | Diff-list | List | Coreference
>>>
>>> That makes it sound like I could do this:
>>>
>>>     mytype := [ FEAT val ] & supertype.
>>>
>>> or even:
>>>
>>>     mytype := <! diff list.. !> & #coref & supertype.
>>>
>>> But elsewhere it seems like a list of parents is special and appears
>>> before the rest of the conjunction. E.g., at read-tdl-avm-def of
>>> lingo/lkb/src/io-tdl/tdltypeinput.lsp I see this alternate definition
>>> of Avm-def:
>>>
>>>   ;;; Avm-def -> := Parents Conjunction | Parents Comment Conjunction |
>>>   ;;;               Parents | Parents Comment
>>>
>>> It seems that both ACE and PET are fine with putting supertypes after
>>> the feature list (and some other variations). I'm fine with this, but I
>>> wonder what it means for docstrings (see Q3 below), which (I think) are
>>> supposed to appear after the list of parents and before the feature list.
>>>
>>>
>>> Q2: Subtype-def is now just a variant of Avm-def, yes?
>>>
>>> The BNF still describes subtyping (with the :< operator) as only taking
>>> a single parent:
>>>
>>>     Subtype-def ->  :< type
>>>
>>> But I believe the consensus is that this is unnecessary (it's equivalent
>>> to using := with only a supertype), so :< is treated as equivalent to :=
>>> (to avoid breaking backward compatibility). Is this interpretation used by
>>> all processors?
>>>
>>>
>>> Q3: What's the final word with type comments / docstrings?
>>>
>>> I find evidence of 3 proposed variants: (1) a block of ";" comments
>>> before a typename (LTDB-style); (2) a block of ";" comments within a type
>>> description; and (3) a "doc string" within a type description. Furthermore,
>>> there is a question as to whether comments or strings within a type go
>>> after the ":=" or after the list of supertypes. I think #| ... |# comments
>>> were not considered for this purpose.
>>>
>>> My guess is this:
>>>
>>> * LTDB-style comments (before the type identifier) are processed
>>> separately from TDL-parsing
>>> * type-internal comments can go anywhere but are discarded
>>> * type-internal doc strings must appear after the list of supertypes and
>>> are later available for inspection (they are included as a non-functional
>>> part of a type)
>>>
>>> ACE seems happy with my assumptions, although PET doesn't seem to like
>>> doc strings at all.
>>>
>>>
>>> Thanks!
>>>
>>> --
>>> Michael Wayne Goodman
>>>
>>
>>
>>
>> --
>> Michael Wayne Goodman
>>
>
>

-- 
Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20180711/0d60934d/attachment-0001.html>