[developers] Distinguishing between orthographemic and non-orthographemic rules

Ann Copestake aac10 at cl.cam.ac.uk
Fri Jul 22 15:46:10 CEST 2016


Sorry that I didn't get back to you on this.  There is a certain amount 
of flexibility in the way the system is set up, partly to allow the use 
of other types of spelling rule, but it's unnecessarily messy and I 
agree it would be useful to clear it up.  I wonder whether it would be 
good to specify a rule via %affix rather than %irregular - although 
%affix without any description will amount to `go and look it up in the 
irregulars file', it could also be used to invoke any external 
component. I'm reluctant to enforce this change in the LKB, since it 
could cause working grammars to break in ways which are difficult to 
debug, but there's presumably less of an issue of small legacy grammars 
with ACE.  Or you might think that the % syntax should specify the 
external component, that %prefix / %suffix mean "the rules I give here, 
plus an irregs file if present" while %irregular means "just the 
irregular file".
I can see it either way.

All best,

Ann


On 01/07/2016 16:30, Woodley Packard wrote:
> Thanks, Ann and Francis.
>
> I agree with Ann’s improved characterization, of course, and also that 
> a warning would be helpful.  I can have ACE do two things:  (1) warn 
> if the STEM value is not constrained to be string or some subtype 
> thereof, and (2) default to the empty string when an unexpected value 
> comes up during actual processing, but print a runtime warning.
>
> I think we ought to take this opportunity to further clarify though 
> the exact mechanism for flagging which category a rule belongs to. 
>  The obvious and nearly status quo convention would be that the 
> non-TFS mechanism is invoked when %prefix or %suffix is present in the 
> specification of the rule.  At the moment we also invite rules into 
> that category when there is an irregular form declared in irregs.tab 
> even if no %prefix and %suffix is declared; arguably it would be more 
> transparent to add a %irregular declaration in-situ before this is 
> legal (and forbid irregs.tab entries for undeclared orthographemic rules).
>
> There may also be vestiges of other conventions laying around, such as 
> flagging by whether or not STEM is reentrant and by ND-AFF.  The 
> latter is or was used by the ERG (see spelling-change-rule-p from 
> user-fns.lsp, reproduced below; I have no sense of whether this is 
> currently coherent since ACE does not appear to heed it).  If there 
> isn’t a clear need for them I propose that any such mechanisms be 
> deprecated in favor of something fully declarative and uniform.
>
> -Woodley
>
> ERG:
> (defun spelling-change-rule-p (rule)
> ;;; a function which is used to prevent the parser
> ;;; trying to apply a rule which affects spelling and
> ;;; which should therefore only be applied by the morphology
> ;;; system.
> ;;; Old test was for something which was a subtype of
> ;;; *morph-rule-type* - this tests for whether needs affix:
> ;;; < ND-AFF > = + (assuming bool-value-true is default value)
> ;;; in the rule
> (let ((affix (get-dag-value (tdfs-indef
> (rule-full-fs rule)) 'nd-aff)))
> (and affix (bool-value-true affix))))
>
> JACY:
> ;;;
> ;;; detect rules that have orthographemic variation associated to 
> them; those
> ;;; who do should only be applied within the morphology system; this 
> version is
> ;;; a little complicated because we change from a full-form set-up to 
> one with
> ;;; on-line morphology during the course.
> ;;;
> (defun spelling-change-rule-p (rule)
> (rule-orthographemicp rule))
>
>> On Jul 1, 2016, at 6:29 AM, Francis Bond <bond at ieee.org 
>> <mailto:bond at ieee.org>> wrote:
>>
>> On Thu, Jun 30, 2016 at 3:17 PM, Woodley Packard
>> <sweaglesw at sweaglesw.org <mailto:sweaglesw at sweaglesw.org>> wrote:
>>> In the case of Jacy, it seems that the rule 
>>> "vbar-monotransitivization-c-lrule" neither declares orthographemic 
>>> changes (by %suffix, %prefix, or entries in the irregulars table) 
>>> nor declares a value for the mother’s STEM.
>>
>> I think it would be ok to push the burden for this onto grammar
>> developers --- no need for the processor to guess.  In this case STEM
>> should be linked, the rule and a couple of others are poorly written
>> and will be fixed.  Perhaps a friendly warning if this case is seen
>> when compiling the grammar could help us avoid the issue in the
>> future?
>>
>>
>> -- 
>> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>> Division of Linguistics and Multilingual Studies
>> Nanyang Technological University
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20160722/2c358cf2/attachment.html>


More information about the developers mailing list