[developers] chart mapping missing applicable lexical filtering rule?

Sun Jul 8 07:41:33 CEST 2018

Hi gentlemen,

Not sure about other platforms, but I’m pretty sure (recent versions of) ACE computes the effect of orthographemic rules like the period in question and places the (non-unification-based) result in ORTH (or grammar-configured path) for the availability of further unification-based processing.  Older versions of ACE (say, two years old or more?) do not do this, and leave the ORTH value as whatever the unification constraints supplied by the grammar dictate.

Stephan, do I understand you to say you expect to see an uppercase ORTH before application of w_period_plr and a lowercase ORTH value after?  That would seem surprising and unfortunate to me, if perhaps within the formal power of the system if the grammarian truly wanted it...

Regards,
Woodley

> On Jul 7, 2018, at 2:18 PM, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> hi paul,
> 
> lexical filtering applies after lexical parsing, i.e. you need to make sure your rule matches the complete lexical item—in the case where there is a trailing period, that will be an instance of the ’period‘ lexical rule with the ’bank‘ lexical entry as its daughter.
> 
> not quite sure what the orthographemic machinery does about ORTH values, but i suspect that after the application of the ’period‘ the ORTH value may be either unset or (more likely) normalized to all lower case.  upon the application of orthographemic (aka spelling-changing) rules, the ORTH value of the mother cannot just be determined by unification, e.g. a re-entrancy into the daughter (as is common for lexical rules that do not affect spelling).
> 
> so, to make your current approach work, i think you would have to let the trigger rule detect proper names by a property other than ORTH.
> 
> alternatively, you could try making ORTH.FIRST re-entrant with TOKENS.+LIST.FIRST.+FORM, so that lexical instantiation will fail against an incoming token feature structure that does not match in case.  i have long been thinking this latter technique (as a type addendum on n_-_pn_le) could make a nice stepping stone towards a case-sensitive configuration of the ERG (which might give non-trivial efficiency gains on carefully edited text :-).
> 
> best wishes, oe
> 
> 
> On Sat, 7 Jul 2018 at 21:21 <paul at haleyai.com <mailto:paul at haleyai.com>> wrote:
> Dear Developers,
> 
> In one use case, it would be nice to limit the use of capitalized proper nouns to cases in which the input is capitalized.  I have been successful in doing so with some exception, such as shown below.
> 
> I am surprised by the following behavior and either have something to learn or perhaps there is a bug in PET's chart mapping?
> 
> Regards,
> Paul
> 
> 
> Given a capitalized lexical entry such as:
> 
>       Bank_NNP := n_-_pn_le & [ORTH <"Bank">,SYNSEM [LKEYS.KEYREL.CARG "Bank",PHON.ONSET con]].
> 
> The following lexical filtering rule (which has been simplified for the demonstration purposes of this email):
> 
>       veto_capitalized_native_uncapitalized_lfr := lexical_filtering_rule & [+CONTEXT <>,+INPUT <[ORTH.FIRST ^[[:upper:]].*$]>,+OUTPUT <>].
> 
> will 'correctly' remove Bank_NNP from the chart when the input is "it is the bank" but fails to do so when a period is appended.
> 
> PET's logging of lexical rules shows as follows for the first case:
> 
>       [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:85 
>       L [85 2-3 the_pn_np1_no (1) -0.1123 {} { : } {}] < blk: 2 dtrs: 50  parents: >
>       [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:92 
>       L [92 3-4 Bank_NNP (1) 0 {} { : } {}] < blk: 2 dtrs: 51  parents: 98 >
>       [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:98 
>       P [98 3-4 n_sg_ilr (1) 0 {} { : } {}] < blk: 2 dtrs: 92  parents: >
> 
> Surprisingly, only the first of these 3 rules applies in the second case. 
> 
> I don't think it matters, but in our case, input is via FSC in which the period is a token.  Thus, the following token mapping rule applies in the second case only:
> 
>     [cm] suffix_punctuation_tmr fired: C1:50 I1:48 O1:51 
>     I [50 () -1--1 <14:15> "" "." { : } {}] < blk: 0 >
>     I [48 () -1--1 <10:14> "" "bank" { : } {}] < blk: 2 >
>     I [51 () -1--1 <10:15> "" "bank." { : } {}] < blk: 0 >
> 
> A redacted AVM for the surviving lexical item follows. As far as I can tell, it matches the lexical filtering rule above and thus should not remain in the chart.
> 
> 
> L [103 3-4 Bank_NNP (1) 0 {} { : w_period_plr} {}] < blk: 0 dtrs: 63  parents: 110 >
> n_-_pn_le
> [ ...
>   SYNSEM   ...
>              PHON   phon
>                     [ ONSET con
>                             [ --TL #16:native_token_cons
>                                        [ FIRST token
>                                                [ +CLASS #17:alphabetic
>                                                             [ +CASE    non_capitalized+lower,
>                                                               +INITIAL - ],
>                                                  +FROM  #3,
>                                                  +FORM  #18:"bank.",
>                                                  +TO    "15",
>                                                  +CARG  "bank",
>                                                  ...
>                                          REST  native_token_null ] ] ],
>              LKEYS  lexkeys_norm
>                     [ KEYREL    named_nom_relation
>                                 [ CFROM #3,
>                                   CTO   #29:"15",
>                                   PRED  named_rel,
>                                   LBL   #15,
>                                   LNK   *list*,
>                                   ARG0  #14,
>                                   CARG  "Bank" ],
>                                   ...                             
>   ORTH     orthography
>            [ FIRST "Bank",
>              REST  *null*,
>              FROM  #3,
>              CLASS #17,
>              ...
>   TOKENS   tokens
>            [ +LIST #16,
>              +LAST token
>                    [ +CLASS #17,
>                      +FROM  "10",
>                      +FORM  "bank.",
>                      +TO    #29,
>                      +CARG  "bank",
>                      ...
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20180707/1ff5dfd1/attachment-0001.html>