<div><div><div dir="auto">hi paul,</div></div><div dir="auto"><br></div><div dir="auto">lexical filtering applies after lexical parsing, i.e. you need to make sure your rule matches the complete lexical item—in the case where there is a trailing period, that will be an instance of the ’period‘ lexical rule with the ’bank‘ lexical entry as its daughter.</div><div dir="auto"><br></div><div dir="auto">not quite sure what the orthographemic machinery does about ORTH values, but i suspect that after the application of the ’period‘ the ORTH value may be either unset or (more likely) normalized to all lower case. upon the application of orthographemic (aka spelling-changing) rules, the ORTH value of the mother cannot just be determined by unification, e.g. a re-entrancy into the daughter (as is common for lexical rules that do not affect spelling).</div><div dir="auto"><br></div><div dir="auto">so, to make your current approach work, i think you would have to let the trigger rule detect proper names by a property other than ORTH.</div></div><div dir="auto"><br></div><div dir="auto">alternatively, you could try making ORTH.FIRST re-entrant with TOKENS.+LIST.FIRST.+FORM, so that lexical instantiation will fail against an incoming token feature structure that does not match in case. i have long been thinking this latter technique (as a type addendum on n_-_pn_le) could make a nice stepping stone towards a case-sensitive configuration of the ERG (which might give non-trivial efficiency gains on carefully edited text :-).</div><div dir="auto"><br></div><div dir="auto">best wishes, oe</div><div dir="auto"><br></div><div dir="auto"><br></div><div><div><div class="gmail_quote"><div dir="ltr">On Sat, 7 Jul 2018 at 21:21 <<a href="mailto:paul@haleyai.com" target="_blank">paul@haleyai.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear Developers,<br> <br> In one use case, it would be nice to limit the use of capitalized proper nouns to cases in which the input is capitalized. I have been successful in doing so with some exception, such as shown below.<br> <br> I am surprised by the following behavior and either have something to learn or perhaps there is a bug in PET's chart mapping?<br> <br> Regards,<br> Paul<br> <br> <br> Given a capitalized lexical entry such as:<br> <br> Bank_NNP := n_-_pn_le & [ORTH <"Bank">,SYNSEM [LKEYS.KEYREL.CARG "Bank",PHON.ONSET con]].<br> <br> The following lexical filtering rule (which has been simplified for the demonstration purposes of this email):<br> <br> veto_capitalized_native_uncapitalized_lfr := lexical_filtering_rule & [+CONTEXT <>,+INPUT <[ORTH.FIRST ^[[:upper:]].*$]>,+OUTPUT <>].<br> <br> will 'correctly' remove Bank_NNP from the chart when the input is "it is the bank" but fails to do so when a period is appended.<br> <br> PET's logging of lexical rules shows as follows for the first case:<br> <br> [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:85 <br> L [85 2-3 the_pn_np1_no (1) -0.1123 {} { : } {}] < blk: 2 dtrs: 50 parents: ><br> [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:92 <br> L [92 3-4 Bank_NNP (1) 0 {} { : } {}] < blk: 2 dtrs: 51 parents: 98 ><br> [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:98 <br> P [98 3-4 n_sg_ilr (1) 0 {} { : } {}] < blk: 2 dtrs: 92 parents: ><br> <br> Surprisingly, only the first of these 3 rules applies in the second case. <br> <br> I don't think it matters, but in our case, input is via FSC in which the period is a token. Thus, the following token mapping rule applies in the second case only:<br> <br> [cm] suffix_punctuation_tmr fired: C1:50 I1:48 O1:51 <br> I [50 () -1--1 <14:15> "" "." { : } {}] < blk: 0 ><br> I [48 () -1--1 <10:14> "" "bank" { : } {}] < blk: 2 ><br> I [51 () -1--1 <10:15> "" "bank." { : } {}] < blk: 0 ><br> <br> A redacted AVM for the surviving lexical item follows. As far as I can tell, it matches the lexical filtering rule above and thus should not remain in the chart.<br> <br> <br> L [103 3-4 Bank_NNP (1) 0 {} { : w_period_plr} {}] < blk: 0 dtrs: 63 parents: 110 ><br> n_-_pn_le<br> [ ...<br> SYNSEM ...<br> PHON phon<br> [ ONSET con<br> [ --TL #16:native_token_cons<br> [ FIRST token<br> [ +CLASS #17:alphabetic<br> [ +CASE non_capitalized+lower,<br> +INITIAL - ],<br> +FROM #3,<br> +FORM #18:"bank.",<br> +TO "15",<br> +CARG "bank",<br> ...<br> REST native_token_null ] ] ],<br> LKEYS lexkeys_norm<br> [ KEYREL named_nom_relation<br> [ CFROM #3,<br> CTO #29:"15",<br> PRED named_rel,<br> LBL #15,<br> LNK *list*,<br> ARG0 #14,<br> CARG "Bank" ],<br> ... <br> ORTH orthography<br> [ FIRST "Bank",<br> REST *null*,<br> FROM #3,<br> CLASS #17,<br> ...<br> TOKENS tokens<br> [ +LIST #16,<br> +LAST token<br> [ +CLASS #17,<br> +FROM "10",<br> +FORM "bank.",<br> +TO #29,<br> +CARG "bank",<br> ...<br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> </blockquote></div></div> </div>