[developers] Distinguishing between orthographemic and non-orthographemic rules

Ann Copestake aac10 at cl.cam.ac.uk
Fri Jul 1 10:42:03 CEST 2016

Hi all,

I would say that describing it as two classes is correct, but my 
characterisation of the first is that there's a non-TFS component of the 
grammar which is responsible for the change, not that the processor is 
responsible.  The %suffix etc mechanism is formally defined, but it's a 
different formalism (for an outline of why, see footnote).

The idea is that the non-TFS component is signalled (e.g., by %) and 
that's what the processor needs to know how to handle, as a special 
case.  Everything else is just TFS.  Lexical rules without those 
external operations are almost equivalent to ordinary unary rules (the 
only reason they are not equivalent in the LKB formalism is that they 
can be interleaved with rules where there's an invokation of the 
external component, while syntactic unary rules can't be).

So my analysis of this situations is that the interface between the two 
formalisms has not been completely well defined. That is, the 
morphophonology component expects a string, and so is entitled to crash 
when it doesn't get it, but that's indeed not very helpful.  Perhaps the 
value for STEM (and anywhere else that component is used) should be 
declared to be a string (or list of strings), so at least it gets an 
empty string rather than *top*.

More generally: 1) anywhere a non-TFS component is used, its use has to 
be clearly signalled 2) the TFS component is responsible for making sure 
it has something of the correct type and there should, perhaps, be some 
type-checking to make sure this happens.



Footnote - the point is not that one can't do these operations in a TFS 
framework - obviously one can, since they are a Turing machine - but 
that one needs very different conditions on the TFS framework to do 
that, which means it can't be integrated into a single grammar without 
being very cumbersome.  In particular, if one does this all in TFSs, and 
assumes that the input is just normal text, the fundamental unit has to 
be the character, because that's the only thing we can tokenise into 
with a dumb tokeniser.

On 30/06/2016 23:17, Woodley Packard wrote:
> Hi developers,
> Starting a version or two ago, ACE has taken it upon itself to keep the strings under STEM (or whatever the orth-path configuration points to) up to date during morphological changes.  Mike recently noticed that these changes have caused Jacy to not work with ACE for certain inputs.  In my view, there are fundamentally two different types of (lexical) rules in our universe:
> - rules for which the processor (ACE, LKB, PET, Agree, what-have-you) is responsible for determining the STEM value on the mother edge, as a function of the STEM value on the daughter edge.  That function is determined by %suffix and %prefix statements, as well as irregular form tables.
> - rules for which the grammar is responsible for determining the STEM value of the mother edge.  This will generally amount to a reentrancy between the mother’s STEM and the daughter’s STEM, but in principle could be an explicit change.
> In the case of Jacy, it seems that the rule "vbar-monotransitivization-c-lrule" neither declares orthographemic changes (by %suffix, %prefix, or entries in the irregulars table) nor declares a value for the mother’s STEM.  My questions for the community are: (1) are we in agreement that these two fundamental classes of lexical rules exist, as outlined above? and (2) if so, how should a processor decide which case a given rule falls into?  Currently ACE treats the Jacy rule in question as belonging to the latter class since no changes are declared.  This leads to something like [ STEM < *top* > ].  On application of subsequent lexical rules that actually do have orthographemic reflex, ACE then crashes (not ideal, but I want feedback about how to improve it :-)).
> Thanks,
> Woodley
> n.b. I might be wrong about exactly which part of Jacy is underspecific, as I don’t have access to full debugging facilities right now, but I believe the above is an accurate summary of the situation.

More information about the developers mailing list