[developers] chart mapping missing applicable lexical filtering rule?

paul at haleyai.com paul at haleyai.com
Sat Jul 7 21:20:35 CEST 2018


Dear Developers,

In one use case, it would be nice to limit the use of capitalized proper nouns to cases in which the input is capitalized.  I have been successful in doing so with some exception, such as shown below.

I am surprised by the following behavior and either have something to learn or perhaps there is a bug in PET's chart mapping?

Regards,
Paul


Given a capitalized lexical entry such as:

      Bank_NNP := n_-_pn_le & [ORTH <"Bank">,SYNSEM [LKEYS.KEYREL.CARG "Bank",PHON.ONSET con]].

The following lexical filtering rule (which has been simplified for the demonstration purposes of this email):

      veto_capitalized_native_uncapitalized_lfr := lexical_filtering_rule & [+CONTEXT <>,+INPUT <[ORTH.FIRST ^[[:upper:]].*$]>,+OUTPUT <>].

will 'correctly' remove Bank_NNP from the chart when the input is "it is the bank" but fails to do so when a period is appended.

PET's logging of lexical rules shows as follows for the first case:

      [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:85 
      L [85 2-3 the_pn_np1_no (1) -0.1123 {} { : } {}] < blk: 2 dtrs: 50  parents: >
      [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:92 
      L [92 3-4 Bank_NNP (1) 0 {} { : } {}] < blk: 2 dtrs: 51  parents: 98 >
      [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:98 
      P [98 3-4 n_sg_ilr (1) 0 {} { : } {}] < blk: 2 dtrs: 92  parents: >

Surprisingly, only the first of these 3 rules applies in the second case. 

I don't think it matters, but in our case, input is via FSC in which the period is a token.  Thus, the following token mapping rule applies in the second case only:

    [cm] suffix_punctuation_tmr fired: C1:50 I1:48 O1:51 
    I [50 () -1--1 <14:15> "" "." { : } {}] < blk: 0 >
    I [48 () -1--1 <10:14> "" "bank" { : } {}] < blk: 2 >
    I [51 () -1--1 <10:15> "" "bank." { : } {}] < blk: 0 >

A redacted AVM for the surviving lexical item follows. As far as I can tell, it matches the lexical filtering rule above and thus should not remain in the chart.


L [103 3-4 Bank_NNP (1) 0 {} { : w_period_plr} {}] < blk: 0 dtrs: 63  parents: 110 >
n_-_pn_le
[ ...
  SYNSEM   ...
             PHON   phon
                    [ ONSET con
                            [ --TL #16:native_token_cons
                                       [ FIRST token
                                               [ +CLASS #17:alphabetic
                                                            [ +CASE    non_capitalized+lower,
                                                              +INITIAL - ],
                                                 +FROM  #3,
                                                 +FORM  #18:"bank.",
                                                 +TO    "15",
                                                 +CARG  "bank",
						 ...
                                         REST  native_token_null ] ] ],
             LKEYS  lexkeys_norm
                    [ KEYREL    named_nom_relation
                                [ CFROM #3,
                                  CTO   #29:"15",
                                  PRED  named_rel,
                                  LBL   #15,
                                  LNK   *list*,
                                  ARG0  #14,
                                  CARG  "Bank" ],
				  ...				  
  ORTH     orthography
           [ FIRST "Bank",
             REST  *null*,
             FROM  #3,
             CLASS #17,
	     ...
  TOKENS   tokens
           [ +LIST #16,
             +LAST token
                   [ +CLASS #17,
                     +FROM  "10",
                     +FORM  "bank.",
                     +TO    #29,
                     +CARG  "bank",
		     ...












More information about the developers mailing list