[developers] More TDL cobwebs

Woodley Packard sweaglesw at sweaglesw.org
Fri Sep 7 08:01:26 CEST 2018


I don’t have much in the way of opinions over most of the cobwebs you have been dusting off -- and by the way, thank you Mike for your work!

I don’t think this one (point 10 about strings) is a move in the right direction though.  I disagree with the notion that quoted strings are instances instead of types, in the technical sense: they are types, specifically subtypes of 'string' (or *string* in some grammars).  A given string type can be instantiated as part of an analysis licensed by the grammar; indeed, a given sign can feature the same quoted string under multiple paths.  The instance of that string at those multiple paths need not be the same instance (then they would be correferent).  Formally, I feel that strings indeed represent elements of a hierarchy (leaf nodes); whether an implementation treats them as such need not be specified.  In recognition of the limitations current engines place on string types, and also their intended usage, I would be in favor of a formal stipulation that *string* cannot bear features, and cannot unify with anything other than regexes and quoted strings, so declarations like the following would be illegal (but not at the level of TDL syntax):

x := synsem & "giraffe".
"kangaroo" := *string* & [ REGION australia ].

The status of regular expressions is less clear, but to me it makes sense to conceive of them also as subtypes of *string* which subsume some but not all quoted strings.  From a theoretical perspective, it makes perfect sense to unify two regular expressions (or unify a regular expression with a string).  Now, ACE certainly does not support that, and I doubt other engines do either --- but that doesn’t seem like a reason to forbid it at the level of TDL syntax.

There will always be things you can write as well-formed TDL syntax that still contain semantic errors.  For instance, consider the following:

w := [ A bool ].
x := [ A + ].
y := [ A - ].
z := x & y.

In my opinion, any reasonable definition of TDL syntax should consider the above acceptable (in the extreme case, suppose the declarations are in separate files).  However, 'z' will of course be found to be nonsensical when loaded by an engine that actually tries to construct the type hierarchy.  The questions about regular expressions and strings above have a somewhat more local nature to them, but to me they feel more semantic in nature than syntactic.

Anyway, that’s my viewpoint :-)
Woodley

P.S. By the way, the example you turned up from mtr.tdl is actually to be interpreted as a pattern match, in the spirit of regular expressions.  Arguably the regular expression syntax used in token mapping should be used for transfer / trigger rules as well.


> On Sep 6, 2018, at 9:54 PM, goodman.m.w at gmail.com wrote:
> 
> Sorry, I forgot one more point:
> 
> 10. Disallow unification of strings or regexes with anything
> 
> This follows a conversation on Emily's student list about strings being primitive types. Currently they are just one term that's possible in a conjunction, so the syntax allows this:
> 
>     a := b & [ ATTR "string" & type & < list, ... > & "another string" ].
> 
> This allow applies to ^regex$ patterns. In the other thread we concluded that strings are of type 'string', where this type may be defined separate from the grammar, or may exist in a type hierarchy, but all quoted "strings" in TDL are like instances of that type and don't create new hierarchy entries (I'm not sure what type the regexes are, though). Furthermore, these strings should probably never appear in a conjunction with other types, and not with features. The only other term that makes sense in a conjunction with a string is a coreference, and indeed we see this in the ERG's mtr.tdl:
> 
>     ... PRED #pred & "~._v_", ...
> 
> Anyway, the question is whether we enforce this in the TDL syntax, somewhere else, or not at all. Similarly, do we enforce in the syntax that regexes are not valid in type files (which is the case according to a comment in lkb/src/io-tdl/tdltypeinput.lsp)?
> 
> On Thu, Sep 6, 2018 at 4:29 PM goodman.m.w at gmail.com <mailto:goodman.m.w at gmail.com> <goodman.m.w at gmail.com <mailto:goodman.m.w at gmail.com>> wrote:
> Hi all,
> 
> There are some remaining issues with TDL that I'd like to clean up. First I will summarize some decisions made (or at least not rejected) in previous email threads:
> 
> 1. Supertypes appear before other terms in a conjunction only by convention (not enforced in the syntax)
> 2. Docstrings are triple-quoted and may appear before any top-level term or before the final . terminator
> 3. Comments may appear in definitions anywhere that spaces can, except within strings/regexes/affixing-patterns
> 
> The following changes are things I think people agree with, so I'd like to consider them as decided:
> 
> 4. Removal of the :< operator (if accepted as a variant of :=, throw a warning)
> 5. Removal of 'single-quoted-symbols
> 6. Removal of double-quoted "docstrings"
> 7. Removal of non-regex uses of ^ (otherwise any BNF of TDL is necessarily incomplete because the "extended-syntax" use of ^ is open-ended)
> 
> And there's at least one point I don't think we reached a decision on:
> 
> 8. Instances must have exactly 1 "supertype" (which is really just a type and not a supertype, i.e., it doesn't change the type hierarchy)
> 
> Also:
> 
> 9. Does anyone know how wild-cards differ from letter-sets? I see HaG has a wild-card and suffix pattern like these:
> 
>     %(wild-card (?g ui))
>     ...
>     %suffix (!c!v !c!vn) (!v?g !vn)
> My guess is that wild-cards match but are not used in the replacement, which I can imagine is useful if you want the replacement to use the second of two matches but not the first. It makes me wonder why we don't just use regex substitutions for these things.
> 
> If nobody responds about (1)--(7), I'll make sure the syntax description on the TdlRfc wiki reflects those decisions.
> 
> -- 
> -Michael Wayne Goodman
> 
> 
> -- 
> -Michael Wayne Goodman

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20180906/0007c046/attachment-0001.html>


More information about the developers mailing list