<div><div><div dir="auto">hi paul,</div></div><div dir="auto"><br></div><div dir="auto">lexical filtering applies after lexical parsing, i.e. you need to make sure your rule matches the complete lexical item—in the case where there is a trailing period, that will be an instance of the ’period‘ lexical rule with the ’bank‘ lexical entry as its daughter.</div><div dir="auto"><br></div><div dir="auto">not quite sure what the orthographemic machinery does about ORTH values, but i suspect that after the application of the ’period‘ the ORTH value may be either unset or (more likely) normalized to all lower case. upon the application of orthographemic (aka spelling-changing) rules, the ORTH value of the mother cannot just be determined by unification, e.g. a re-entrancy into the daughter (as is common for lexical rules that do not affect spelling).</div><div dir="auto"><br></div><div dir="auto">so, to make your current approach work, i think you would have to let the trigger rule detect proper names by a property other than ORTH.</div></div><div dir="auto"><br></div><div dir="auto">alternatively, you could try making ORTH.FIRST re-entrant with TOKENS.+LIST.FIRST.+FORM, so that lexical instantiation will fail against an incoming token feature structure that does not match in case. i have long been thinking this latter technique (as a type addendum on n_-_pn_le) could make a nice stepping stone towards a case-sensitive configuration of the ERG (which might give non-trivial efficiency gains on carefully edited text :-).</div><div dir="auto"><br></div><div dir="auto">best wishes, oe</div><div dir="auto"><br></div><div dir="auto"><br></div><div><div><div class="gmail_quote"><div dir="ltr">On Sat, 7 Jul 2018 at 21:21 <<a href="mailto:paul@haleyai.com" target="_blank">paul@haleyai.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear Developers,<br>
<br>
In one use case, it would be nice to limit the use of capitalized proper nouns to cases in which the input is capitalized. I have been successful in doing so with some exception, such as shown below.<br>
<br>
I am surprised by the following behavior and either have something to learn or perhaps there is a bug in PET's chart mapping?<br>
<br>
Regards,<br>
Paul<br>
<br>
<br>
Given a capitalized lexical entry such as:<br>
<br>
Bank_NNP := n_-_pn_le & [ORTH <"Bank">,SYNSEM [LKEYS.KEYREL.CARG "Bank",PHON.ONSET con]].<br>
<br>
The following lexical filtering rule (which has been simplified for the demonstration purposes of this email):<br>
<br>
veto_capitalized_native_uncapitalized_lfr := lexical_filtering_rule & [+CONTEXT <>,+INPUT <[ORTH.FIRST ^[[:upper:]].*$]>,+OUTPUT <>].<br>
<br>
will 'correctly' remove Bank_NNP from the chart when the input is "it is the bank" but fails to do so when a period is appended.<br>
<br>
PET's logging of lexical rules shows as follows for the first case:<br>
<br>
[cm] veto_capitalized_native_uncapitalized_lfr fired: I1:85 <br>
L [85 2-3 the_pn_np1_no (1) -0.1123 {} { : } {}] < blk: 2 dtrs: 50 parents: ><br>
[cm] veto_capitalized_native_uncapitalized_lfr fired: I1:92 <br>
L [92 3-4 Bank_NNP (1) 0 {} { : } {}] < blk: 2 dtrs: 51 parents: 98 ><br>
[cm] veto_capitalized_native_uncapitalized_lfr fired: I1:98 <br>
P [98 3-4 n_sg_ilr (1) 0 {} { : } {}] < blk: 2 dtrs: 92 parents: ><br>
<br>
Surprisingly, only the first of these 3 rules applies in the second case. <br>
<br>
I don't think it matters, but in our case, input is via FSC in which the period is a token. Thus, the following token mapping rule applies in the second case only:<br>
<br>
[cm] suffix_punctuation_tmr fired: C1:50 I1:48 O1:51 <br>
I [50 () -1--1 <14:15> "" "." { : } {}] < blk: 0 ><br>
I [48 () -1--1 <10:14> "" "bank" { : } {}] < blk: 2 ><br>
I [51 () -1--1 <10:15> "" "bank." { : } {}] < blk: 0 ><br>
<br>
A redacted AVM for the surviving lexical item follows. As far as I can tell, it matches the lexical filtering rule above and thus should not remain in the chart.<br>
<br>
<br>
L [103 3-4 Bank_NNP (1) 0 {} { : w_period_plr} {}] < blk: 0 dtrs: 63 parents: 110 ><br>
n_-_pn_le<br>
[ ...<br>
SYNSEM ...<br>
PHON phon<br>
[ ONSET con<br>
[ --TL #16:native_token_cons<br>
[ FIRST token<br>
[ +CLASS #17:alphabetic<br>
[ +CASE non_capitalized+lower,<br>
+INITIAL - ],<br>
+FROM #3,<br>
+FORM #18:"bank.",<br>
+TO "15",<br>
+CARG "bank",<br>
...<br>
REST native_token_null ] ] ],<br>
LKEYS lexkeys_norm<br>
[ KEYREL named_nom_relation<br>
[ CFROM #3,<br>
CTO #29:"15",<br>
PRED named_rel,<br>
LBL #15,<br>
LNK *list*,<br>
ARG0 #14,<br>
CARG "Bank" ],<br>
... <br>
ORTH orthography<br>
[ FIRST "Bank",<br>
REST *null*,<br>
FROM #3,<br>
CLASS #17,<br>
...<br>
TOKENS tokens<br>
[ +LIST #16,<br>
+LAST token<br>
[ +CLASS #17,<br>
+FROM "10",<br>
+FORM "bank.",<br>
+TO #29,<br>
+CARG "bank",<br>
...<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
</blockquote></div></div>
</div>