<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:0in;
        margin-left:.5in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
p.msonormal0, li.msonormal0, div.msonormal0
        {mso-style-name:msonormal;
        mso-margin-top-alt:auto;
        margin-right:0in;
        mso-margin-bottom-alt:auto;
        margin-left:0in;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
span.EmailStyle18
        {mso-style-type:personal-reply;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri",sans-serif;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
/* List Definitions */
@list l0
        {mso-list-id:1272082338;
        mso-list-type:hybrid;
        mso-list-template-ids:-1279333212 67698689 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;
        font-family:Symbol;}
@list l0:level2
        {mso-level-number-format:bullet;
        mso-level-text:o;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;
        font-family:"Courier New";}
@list l0:level3
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;
        font-family:Wingdings;}
@list l0:level4
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;
        font-family:Symbol;}
@list l0:level5
        {mso-level-number-format:bullet;
        mso-level-text:o;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;
        font-family:"Courier New";}
@list l0:level6
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;
        font-family:Wingdings;}
@list l0:level7
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;
        font-family:Symbol;}
@list l0:level8
        {mso-level-number-format:bullet;
        mso-level-text:o;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;
        font-family:"Courier New";}
@list l0:level9
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;
        font-family:Wingdings;}
ol
        {margin-bottom:0in;}
ul
        {margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'>Hi Stephan,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'>You led me in good directions, especially elsewhere than ORTH.FIRST. I found that the following works for both cases (“bank” and “bank.”):<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'><o:p> </o:p></span></p><ul style='margin-top:0in' type=disc><li class=MsoListParagraph style='margin-left:0in;mso-list:l0 level1 lfo1'><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'>veto_capitalized_native_uncapitalized_lfr := lexical_filtering_rule & [+CONTEXT <>,+INPUT <[SYNSEM [PHON.ONSET con_or_voc & [--TL.FIRST.+CARG ^[[:lower:]].*$],LKEYS.KEYREL [PRED named_np_rel, CARG ^[[:upper:]].*$]]]>,+OUTPUT <>].<o:p></o:p></span></li></ul><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'>This does indeed cause a significant reduction in ambiguity for careful text. Now it works a little better. <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'>Thank you,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'>Paul<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'>P.S. There is still something fishy going on, though, since for “it is the.” (which I do not expect to parse) a lexical item for the_pn_np1_no remains in the chart with an AVM that matches the above lexical filtering rule but the rule is not applied/logged. (The ‘period’ lexical rule does not apply in this case.) <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'>If interested, the chart will be here for a while: https://1drv.ms/u/s!Am2TXpQ1-kjsjQRUXeFXFK8QJ7mD<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'><o:p> </o:p></span></p><p class=MsoNormal><b><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'>From:</span></b><span style='font-size:11.0pt;font-family:"Calibri",sans-serif'> Stephan Oepen <oe@ifi.uio.no> <br><b>Sent:</b> Saturday, July 7, 2018 5:19 PM<br><b>To:</b> paul@haleyai.com<br><b>Cc:</b> developers <developers@delph-in.net><br><b>Subject:</b> Re: [developers] chart mapping missing applicable lexical filtering rule?<o:p></o:p></span></p><p class=MsoNormal><o:p> </o:p></p><div><div><div><p class=MsoNormal>hi paul,<o:p></o:p></p></div></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>lexical filtering applies after lexical parsing, i.e. you need to make sure your rule matches the complete lexical item—in the case where there is a trailing period, that will be an instance of the ’period‘ lexical rule with the ’bank‘ lexical entry as its daughter.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>not quite sure what the orthographemic machinery does about ORTH values, but i suspect that after the application of the ’period‘ the ORTH value may be either unset or (more likely) normalized to all lower case. upon the application of orthographemic (aka spelling-changing) rules, the ORTH value of the mother cannot just be determined by unification, e.g. a re-entrancy into the daughter (as is common for lexical rules that do not affect spelling).<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>so, to make your current approach work, i think you would have to let the trigger rule detect proper names by a property other than ORTH.<o:p></o:p></p></div></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>alternatively, you could try making ORTH.FIRST re-entrant with TOKENS.+LIST.FIRST.+FORM, so that lexical instantiation will fail against an incoming token feature structure that does not match in case. i have long been thinking this latter technique (as a type addendum on n_-_pn_le) could make a nice stepping stone towards a case-sensitive configuration of the ERG (which might give non-trivial efficiency gains on carefully edited text :-).<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>best wishes, oe<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><div><div><div><p class=MsoNormal>On Sat, 7 Jul 2018 at 21:21 <<a href="mailto:paul@haleyai.com" target="_blank">paul@haleyai.com</a>> wrote:<o:p></o:p></p></div><blockquote style='border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in'><p class=MsoNormal style='margin-bottom:12.0pt'>Dear Developers,<br><br>In one use case, it would be nice to limit the use of capitalized proper nouns to cases in which the input is capitalized. I have been successful in doing so with some exception, such as shown below.<br><br>I am surprised by the following behavior and either have something to learn or perhaps there is a bug in PET's chart mapping?<br><br>Regards,<br>Paul<br><br><br>Given a capitalized lexical entry such as:<br><br> Bank_NNP := n_-_pn_le & [ORTH <"Bank">,SYNSEM [LKEYS.KEYREL.CARG "Bank",PHON.ONSET con]].<br><br>The following lexical filtering rule (which has been simplified for the demonstration purposes of this email):<br><br> veto_capitalized_native_uncapitalized_lfr := lexical_filtering_rule & [+CONTEXT <>,+INPUT <[ORTH.FIRST ^[[:upper:]].*$]>,+OUTPUT <>].<br><br>will 'correctly' remove Bank_NNP from the chart when the input is "it is the bank" but fails to do so when a period is appended.<br><br>PET's logging of lexical rules shows as follows for the first case:<br><br> [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:85 <br> L [85 2-3 the_pn_np1_no (1) -0.1123 {} { : } {}] < blk: 2 dtrs: 50 parents: ><br> [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:92 <br> L [92 3-4 Bank_NNP (1) 0 {} { : } {}] < blk: 2 dtrs: 51 parents: 98 ><br> [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:98 <br> P [98 3-4 n_sg_ilr (1) 0 {} { : } {}] < blk: 2 dtrs: 92 parents: ><br><br>Surprisingly, only the first of these 3 rules applies in the second case. <br><br>I don't think it matters, but in our case, input is via FSC in which the period is a token. Thus, the following token mapping rule applies in the second case only:<br><br> [cm] suffix_punctuation_tmr fired: C1:50 I1:48 O1:51 <br> I [50 () -1--1 <14:15> "" "." { : } {}] < blk: 0 ><br> I [48 () -1--1 <10:14> "" "bank" { : } {}] < blk: 2 ><br> I [51 () -1--1 <10:15> "" "bank." { : } {}] < blk: 0 ><br><br>A redacted AVM for the surviving lexical item follows. As far as I can tell, it matches the lexical filtering rule above and thus should not remain in the chart.<br><br><br>L [103 3-4 Bank_NNP (1) 0 {} { : w_period_plr} {}] < blk: 0 dtrs: 63 parents: 110 ><br>n_-_pn_le<br>[ ...<br> SYNSEM ...<br> PHON phon<br> [ ONSET con<br> [ --TL #16:native_token_cons<br> [ FIRST token<br> [ +CLASS #17:alphabetic<br> [ +CASE non_capitalized+lower,<br> +INITIAL - ],<br> +FROM #3,<br> +FORM #18:"bank.",<br> +TO "15",<br> +CARG "bank",<br> ...<br> REST native_token_null ] ] ],<br> LKEYS lexkeys_norm<br> [ KEYREL named_nom_relation<br> [ CFROM #3,<br> CTO #29:"15",<br> PRED named_rel,<br> LBL #15,<br> LNK *list*,<br> ARG0 #14,<br> CARG "Bank" ],<br> ... <br> ORTH orthography<br> [ FIRST "Bank",<br> REST *null*,<br> FROM #3,<br> CLASS #17,<br> ...<br> TOKENS tokens<br> [ +LIST #16,<br> +LAST token<br> [ +CLASS #17,<br> +FROM "10",<br> +FORM "bank.",<br> +TO #29,<br> +CARG "bank",<br> ...<br><br><br><br><br><br><br><br><br><br><o:p></o:p></p></blockquote></div></div></div></div></body></html>