[developers] More TDL cobwebs
sweaglesw at sweaglesw.org
Fri Sep 7 19:17:42 CEST 2018
At the moment, no, it would not work to just replace the pattern syntax used in MTR files with the ^...$ format. That is because (in ACE at least) support for (all 3 types of) pattern matching are specific to the modules of the infrastructure in which they apply. That is to say, the regex handling in token mapping is not accessible to syntax rules or transfer rules. Ditto the pattern handling used for morphology. I’m certainly not a fan of this state of affairs, but changing it would take some doing. I be thrilled to see a unified approach, possibly based in the notion of regexes as types fitting between *string* and “kangaroo” as proposed earlier, but it wouldn’t be simple, and I’m not 100% sure it would be an improvement, at least in the morphology department. One vexing wrinkle in particular would be the status of capture groups (necessary both for morphology and token mapping). These would conceptually be similar to reentrancies, but they are subatomic (for lack of a better term). I’m also not sure what the formal status of a string that contains a capture group reference would be, or how unifying things with it would be implemented. I imagine the broader unification-based parsing community probably has experience with something like this that could be interesting to look at. I worry somewhat that efficiency could be a concern, although it seems like the typical use cases would be unlikely to become too entangled.
On Sep 7, 2018, at 9:45 AM, "goodman.m.w at gmail.com" <goodman.m.w at gmail.com> wrote:
>> P.S. By the way, the example you turned up from mtr.tdl is actually to be interpreted as a pattern match, in the spirit of regular expressions. Arguably the regular expression syntax used in token mapping should be used for transfer / trigger rules as well.
> Yet another un(der)documented string-matching implementation? In this case, though, the pattern is enclosed in a string so from the perspective of syntax it doesn't change anything. Can we currently just replace these with regexes? [ PRED ^.*_n_.*$ ]
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the developers