[developers] spanish grammar/TDL documentation strings

Emily M. Bender ebender at u.washington.edu
Wed Sep 19 17:17:13 CEST 2007


On Wed, Sep 19, 2007 at 03:04:15PM +0200, Stephan Oepen wrote:
> either way, i guess there will be no immediate resolution, hence it is
> good that emily has already taken such comments out of the Matrix.  if
> there was an original proposal to `developers', and assuming emily was
> part of it, in my experience she commands a perfect memory (no wonder,
> at her youthful age :-) and can no doubt retrieve the original email?
> and if not emily, i am sure ann was involved?  emily, why did you add
> this syntax to the Matrix in the first place?

If you put it that way, I have no choice but to go digging, do I?

Below is an email from Ann, dated 2004-06-21, exploring the options
for the documentation strings.  The original motivaiton, as I remember
it, was to provide facilities for grammar writers to associate
documentation with particular types/instances such that it could
be displayed in the GUI in various ways.  The examples in the Matrix
were the result of a fit of conscience on my part about lack of
documentation, presumably.  I still think this is in general a good
idea, but since a) there is not yet any facility in the GUI for
vieweing these (right?) and b) hardly any grammars use them, I'm
not terribly bothered about taking out my few doc strings.

> more high-level: ideally we would establish a procedure for discussing 
> extensions or changes that affect multiple components; the `developers'
> list should play a central role in such routines, i would think.

This sounds reasonable to me.  `developers' cc-ed.

-- Emily

To: lkb-bugs at csli.stanford.edu, ebender at u.washington.edu
cc: Ann.Copestake at cl.cam.ac.uk
Subject: TDL syntax and documentation strings
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Mon, 21 Jun 2004 17:26:23 +0100
From: Ann Copestake <Ann.Copestake at cl.cam.ac.uk>
Message-Id: <E1BcRcm-0000Ro-00 at mta1.cl.cam.ac.uk>
Status: RO
X-Status: A
Content-Length: 3464
Lines: 84


since I started talking about this, I'll try and explain now what the issue is.

The subset of TDL syntax that the LKB now officially uses for type definitions 
is described in BNF as follows

;;; Type-def -> Type-id Avm-def . 
;;; Type-id  -> identifier
;;; Avm-def -> := Conjunction
;;; Conjunction -> Term { & Term } *
;;; Term -> Type | Feature-term | Diff-list | List | Coreference
etc

(Note that I've got rid of :< as agreed sometime ago, though still supported 
for backward compatibility.)

What happens if we put in the comment string as proposed (henceforth, option 
i)) is that we get something like the following as the BNF:

;;; Type-def -> Type-id := Type-spec . 
;;; Type-id  -> identifier
;;; Type-spec -> Parents | Parents Doc | Parents Conjunction | Parents Doc 
Conjunction
;;; Parents -> Parent { & Parent } * &
;;; Doc -> " <anything> "
;;; Conjunction -> Term { & Term } *
;;; Term -> Type | Feature-term | Diff-list | List | Coreference
etc

I think this is probably a `good thing', because it makes it obvious that the 
type parents are not part of an AVM definition but have a special status.  
However, it does mean that this doc string syntax can't be extended as you 
might have thought to the definition of lexical entries, rules etc, where the 
conventionally initial type is actually optional and is just part of the AVM 
description.

This isn't a trivial point - the type/FS distinction is a problem for students 
and I think at least 60% of the problem is caused by the TDL syntax.  So, 
while I would be happy if we altered the syntax for types as above, because it 
makes the status of parents clearer, I would be VERY unhappy if someone 
allowed apparently the same syntax in lexical entries etc, where there is no 
notion of a parent.

The alternative options that oe mentions in last year's message are
ii) have a comment at a designated place (introduced by ;;; or in #| |#s 
presumably) or iii) use a syntax that made the comment strings like strings 
that were actually part of types

ii) is an option - we could say that any comment immediately before a type was 
the doc string or allow internal comments and say the first was the doc 
string. If someone wants to comment out part of the TDL and doesn't want that 
to be the doc string, then they'll have to put in a `real' doc string, but 
it's not going to be a disaster to have a few unwanted doc strings.

iii) is even worse than existing TDL in making the syntax not fit with the TDL 
semantics, so is out.

iv) put the doc string enclosed in ""s immediately after the `:='.  We 
rejected this on aesthetic and practical grounds, I believe, since it would 
make more difference in the look of the files, although in terms of the syntax 
it makes a lot of sense.

v) put the doc string in ""s at the end, but this won't be readable with long 
definitions.

vi) invent yet another reserved character to delimit doc strings and put them 
anywhere!

My current thoughts are as follows: if we only have doc strings for types, I'm 
happy with i) and will implement it officially.  However, in this case, we 
cannot use that syntax for non-type definitions.  If we want to have doc 
strings generally, I vote for ii) (the `first comment inside TDL definition' 
syntax) as the least bad option, and will rewrite the LKB reader to allow 
comments anywhere.  I would also accept iv).  I may change my mind again and 
I'm happy to listen to alternative options.

Ann











More information about the developers mailing list