[developers] spanish grammar/TDL documentation strings
Emily M. Bender
ebender at u.washington.edu
Wed Sep 19 17:17:13 CEST 2007
On Wed, Sep 19, 2007 at 03:04:15PM +0200, Stephan Oepen wrote:
> either way, i guess there will be no immediate resolution, hence it is
> good that emily has already taken such comments out of the Matrix. if
> there was an original proposal to `developers', and assuming emily was
> part of it, in my experience she commands a perfect memory (no wonder,
> at her youthful age :-) and can no doubt retrieve the original email?
> and if not emily, i am sure ann was involved? emily, why did you add
> this syntax to the Matrix in the first place?
If you put it that way, I have no choice but to go digging, do I?
Below is an email from Ann, dated 2004-06-21, exploring the options
for the documentation strings. The original motivaiton, as I remember
it, was to provide facilities for grammar writers to associate
documentation with particular types/instances such that it could
be displayed in the GUI in various ways. The examples in the Matrix
were the result of a fit of conscience on my part about lack of
documentation, presumably. I still think this is in general a good
idea, but since a) there is not yet any facility in the GUI for
vieweing these (right?) and b) hardly any grammars use them, I'm
not terribly bothered about taking out my few doc strings.
> more high-level: ideally we would establish a procedure for discussing
> extensions or changes that affect multiple components; the `developers'
> list should play a central role in such routines, i would think.
This sounds reasonable to me. `developers' cc-ed.
-- Emily
To: lkb-bugs at csli.stanford.edu, ebender at u.washington.edu
cc: Ann.Copestake at cl.cam.ac.uk
Subject: TDL syntax and documentation strings
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Mon, 21 Jun 2004 17:26:23 +0100
From: Ann Copestake <Ann.Copestake at cl.cam.ac.uk>
Message-Id: <E1BcRcm-0000Ro-00 at mta1.cl.cam.ac.uk>
Status: RO
X-Status: A
Content-Length: 3464
Lines: 84
since I started talking about this, I'll try and explain now what the issue is.
The subset of TDL syntax that the LKB now officially uses for type definitions
is described in BNF as follows
;;; Type-def -> Type-id Avm-def .
;;; Type-id -> identifier
;;; Avm-def -> := Conjunction
;;; Conjunction -> Term { & Term } *
;;; Term -> Type | Feature-term | Diff-list | List | Coreference
etc
(Note that I've got rid of :< as agreed sometime ago, though still supported
for backward compatibility.)
What happens if we put in the comment string as proposed (henceforth, option
i)) is that we get something like the following as the BNF:
;;; Type-def -> Type-id := Type-spec .
;;; Type-id -> identifier
;;; Type-spec -> Parents | Parents Doc | Parents Conjunction | Parents Doc
Conjunction
;;; Parents -> Parent { & Parent } * &
;;; Doc -> " <anything> "
;;; Conjunction -> Term { & Term } *
;;; Term -> Type | Feature-term | Diff-list | List | Coreference
etc
I think this is probably a `good thing', because it makes it obvious that the
type parents are not part of an AVM definition but have a special status.
However, it does mean that this doc string syntax can't be extended as you
might have thought to the definition of lexical entries, rules etc, where the
conventionally initial type is actually optional and is just part of the AVM
description.
This isn't a trivial point - the type/FS distinction is a problem for students
and I think at least 60% of the problem is caused by the TDL syntax. So,
while I would be happy if we altered the syntax for types as above, because it
makes the status of parents clearer, I would be VERY unhappy if someone
allowed apparently the same syntax in lexical entries etc, where there is no
notion of a parent.
The alternative options that oe mentions in last year's message are
ii) have a comment at a designated place (introduced by ;;; or in #| |#s
presumably) or iii) use a syntax that made the comment strings like strings
that were actually part of types
ii) is an option - we could say that any comment immediately before a type was
the doc string or allow internal comments and say the first was the doc
string. If someone wants to comment out part of the TDL and doesn't want that
to be the doc string, then they'll have to put in a `real' doc string, but
it's not going to be a disaster to have a few unwanted doc strings.
iii) is even worse than existing TDL in making the syntax not fit with the TDL
semantics, so is out.
iv) put the doc string enclosed in ""s immediately after the `:='. We
rejected this on aesthetic and practical grounds, I believe, since it would
make more difference in the look of the files, although in terms of the syntax
it makes a lot of sense.
v) put the doc string in ""s at the end, but this won't be readable with long
definitions.
vi) invent yet another reserved character to delimit doc strings and put them
anywhere!
My current thoughts are as follows: if we only have doc strings for types, I'm
happy with i) and will implement it officially. However, in this case, we
cannot use that syntax for non-type definitions. If we want to have doc
strings generally, I vote for ii) (the `first comment inside TDL definition'
syntax) as the least bad option, and will rewrite the LKB reader to allow
comments anywhere. I would also accept iv). I may change my mind again and
I'm happy to listen to alternative options.
Ann
More information about the developers
mailing list