[developers] More TDL cobwebs
goodman.m.w at gmail.com
goodman.m.w at gmail.com
Fri Sep 7 18:45:19 CEST 2018
(replying to Woodley and John separately)
On Thu, Sep 6, 2018 at 11:01 PM Woodley Packard <sweaglesw at sweaglesw.org>
> I don’t have much in the way of opinions over most of the cobwebs you have
> been dusting off -- and by the way, thank you Mike for your work!
> I don’t think this one (point 10 about strings) is a move in the right
> direction though. I disagree with the notion that quoted strings are
> instances instead of types, in the technical sense: they are types,
> specifically subtypes of 'string' (or *string* in some grammars). A given
> string type can be instantiated as part of an analysis licensed by the
> grammar; indeed, a given sign can feature the same quoted string under
> multiple paths. The instance of that string at those multiple paths need
> not be the same instance (then they would be correferent). Formally, I
> feel that strings indeed represent elements of a hierarchy (leaf nodes);
> whether an implementation treats them as such need not be specified. In
> recognition of the limitations current engines place on string types, and
> also their intended usage, I would be in favor of a formal stipulation that
> *string* cannot bear features, and cannot unify with anything other than
> regexes and quoted strings, so declarations like the following would be
> illegal (but not at the level of TDL syntax):
> x := synsem & "giraffe".
> "kangaroo" := *string* & [ REGION australia ].
Thanks for your feedback. If it wasn't clear, my point (10) was under the
heading of "undecided things" and I welcome the discussion. I think it
makes sense to have unification with strings be illegal but a conjunction
with strings not be a syntax error. My reasoning for declaring it a syntax
error was to (a) spur discussion (success!) and (b) allow PyDelphin (which
doesn't do unification or subsumptive comparisons) to be correct in
allowing statements like:
if lexitem["SYNSEM.LKEYS.KEYREL.PRED"] == "_kangaroo_n_1_rel": ...
instead of the less intuitive:
if "_kangaroo_n_1_rel" in
(b) is a selfish reason, yes, and PyDelphin currently uses the former
(based on our previous conclusion about strings being primitives), but my
recent work with TDL is leaving me convinced that it's not correct. I could
also implement some kind of helper function `stringvalue()` or something.
> The status of regular expressions is less clear, but to me it makes sense
> to conceive of them also as subtypes of *string* which subsume some but not
> all quoted strings. From a theoretical perspective, it makes perfect sense
> to unify two regular expressions (or unify a regular expression with a
> string). Now, ACE certainly does not support that, and I doubt other
> engines do either --- but that doesn’t seem like a reason to forbid it at
> the level of TDL syntax.
> There will always be things you can write as well-formed TDL syntax that
> still contain semantic errors. For instance, consider the following:
> w := [ A bool ].
> x := [ A + ].
> y := [ A - ].
> z := x & y.
> In my opinion, any reasonable definition of TDL syntax should consider the
> above acceptable (in the extreme case, suppose the declarations are in
> separate files). However, 'z' will of course be found to be nonsensical
> when loaded by an engine that actually tries to construct the type
> hierarchy. The questions about regular expressions and strings above have
> a somewhat more local nature to them, but to me they feel more semantic in
> nature than syntactic.
Ok, fair enough. I was thinking if we formally disallow features on strings
we can block them at the level of syntax and make life easier in some ways,
but maybe that is jumping the gun a bit.
> Anyway, that’s my viewpoint :-)
> P.S. By the way, the example you turned up from mtr.tdl is actually to be
> interpreted as a pattern match, in the spirit of regular expressions.
> Arguably the regular expression syntax used in token mapping should be used
> for transfer / trigger rules as well.
Yet another un(der)documented string-matching implementation? In this case,
though, the pattern is enclosed in a string so from the perspective of
syntax it doesn't change anything. Can we currently just replace these with
regexes? [ PRED ^.*_n_.*$ ]
On Sep 6, 2018, at 9:54 PM, goodman.m.w at gmail.com wrote:
Sorry, I forgot one more point:
10. Disallow unification of strings or regexes with anything
This follows a conversation on Emily's student list about strings being
primitive types. Currently they are just one term that's possible in a
conjunction, so the syntax allows this:
a := b & [ ATTR "string" & type & < list, ... > & "another string" ].
This allow applies to ^regex$ patterns. In the other thread we concluded
that strings are of type 'string', where this type may be defined separate
from the grammar, or may exist in a type hierarchy, but all quoted
"strings" in TDL are like instances of that type and don't create new
hierarchy entries (I'm not sure what type the regexes are, though).
Furthermore, these strings should probably never appear in a conjunction
with other types, and not with features. The only other term that makes
sense in a conjunction with a string is a coreference, and indeed we see
this in the ERG's mtr.tdl:
... PRED #pred & "~._v_", ...
Anyway, the question is whether we enforce this in the TDL syntax,
somewhere else, or not at all. Similarly, do we enforce in the syntax that
regexes are not valid in type files (which is the case according to a
comment in lkb/src/io-tdl/tdltypeinput.lsp)?
On Thu, Sep 6, 2018 at 4:29 PM goodman.m.w at gmail.com <goodman.m.w at gmail.com>
> Hi all,
> There are some remaining issues with TDL that I'd like to clean up. First
> I will summarize some decisions made (or at least not rejected) in previous
> email threads:
> 1. Supertypes appear before other terms in a conjunction only by
> convention (not enforced in the syntax)
> 2. Docstrings are triple-quoted and may appear before any top-level term
> or before the final . terminator
> 3. Comments may appear in definitions anywhere that spaces can, except
> within strings/regexes/affixing-patterns
> The following changes are things I think people agree with, so I'd like to
> consider them as decided:
> 4. Removal of the :< operator (if accepted as a variant of :=, throw a
> 5. Removal of 'single-quoted-symbols
> 6. Removal of double-quoted "docstrings"
> 7. Removal of non-regex uses of ^ (otherwise any BNF of TDL is necessarily
> incomplete because the "extended-syntax" use of ^ is open-ended)
> And there's at least one point I don't think we reached a decision on:
> 8. Instances must have exactly 1 "supertype" (which is really just a type
> and not a supertype, i.e., it doesn't change the type hierarchy)
> 9. Does anyone know how wild-cards differ from letter-sets? I see HaG has
> a wild-card and suffix pattern like these:
> %(wild-card (?g ui))
> %suffix (!c!v !c!vn) (!v?g !vn)
> My guess is that wild-cards match but are not used in the replacement,
> which I can imagine is useful if you want the replacement to use the second
> of two matches but not the first. It makes me wonder why we don't just use
> regex substitutions for these things.
> If nobody responds about (1)--(7), I'll make sure the syntax description
> on the TdlRfc wiki reflects those decisions.
> -Michael Wayne Goodman
-Michael Wayne Goodman
-Michael Wayne Goodman
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the developers