<div><div dir="auto">i do believe that PET also injects computed ORTH values after applications of orthographemic rules, but i do not have my laptop at hand to double-check.</div></div><div dir="auto"><br></div><div dir="auto">but i also believe it is likely to use the strings computed during orthographemic segmentation, e.g. (sentence-initial) ‘Unbanking’ <- ‘un’ + ‘bank’ + ‘ing’. because the grammar only specifies lower-case %prefix() and %suffix() rules, i suspect that everything may be downcased inside the orthographemic machinery.</div><div dir="auto"><br></div><div dir="auto">do ACE or Agree actually process prefixes and suffixes insensitive to case but preserve original-case ORTH values? that would of course seem like the right thing to do, lest we finally generalized the orthographemic specification language to support actual regular expressions :-).</div><div dir="auto"><br></div><div dir="auto">paul, with a little more sleep, i would like to refine my suggestion for how to enforce capitalization in lexical entries: Rather than constrain the +FORM value on token feature structures, i now recall i compute the +CASE value during token mapping for exactly your purpose. a type addendum like [ +CASE capitalized ] i would expect to do the trick.</div><div dir="auto"><br></div><div dir="auto">best, oe</div><div dir="auto"><br></div><div><br><div class="gmail_quote"><div dir="ltr">On Sun, 8 Jul 2018 at 07:42 Woodley Packard <<a href="mailto:sweaglesw@sweaglesw.org">sweaglesw@sweaglesw.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>Hi gentlemen,</div><div><br></div><div>Not sure about other platforms, but I’m pretty sure (recent versions of) ACE computes the effect of orthographemic rules like the period in question and places the (non-unification-based) result in ORTH (or grammar-configured path) for the availability of further unification-based processing. Older versions of ACE (say, two years old or more?) do not do this, and leave the ORTH value as whatever the unification constraints supplied by the grammar dictate.</div><div><br></div><div>Stephan, do I understand you to say you expect to see an uppercase ORTH before application of w_period_plr and a lowercase ORTH value after? That would seem surprising and unfortunate to me, if perhaps within the formal power of the system if the grammarian truly wanted it...</div><div><br></div><div>Regards,</div><div>Woodley</div></div><div style="word-wrap:break-word"><br><div><blockquote type="cite"><div>On Jul 7, 2018, at 2:18 PM, Stephan Oepen <<a href="mailto:oe@ifi.uio.no" target="_blank">oe@ifi.uio.no</a>> wrote:</div><br class="m_-4501034546129805722Apple-interchange-newline"><div><div><div><div dir="auto">hi paul,</div></div><div dir="auto"><br></div><div dir="auto">lexical filtering applies after lexical parsing, i.e. you need to make sure your rule matches the complete lexical item—in the case where there is a trailing period, that will be an instance of the ’period‘ lexical rule with the ’bank‘ lexical entry as its daughter.</div><div dir="auto"><br></div><div dir="auto">not quite sure what the orthographemic machinery does about ORTH values, but i suspect that after the application of the ’period‘ the ORTH value may be either unset or (more likely) normalized to all lower case. upon the application of orthographemic (aka spelling-changing) rules, the ORTH value of the mother cannot just be determined by unification, e.g. a re-entrancy into the daughter (as is common for lexical rules that do not affect spelling).</div><div dir="auto"><br></div><div dir="auto">so, to make your current approach work, i think you would have to let the trigger rule detect proper names by a property other than ORTH.</div></div><div dir="auto"><br></div><div dir="auto">alternatively, you could try making ORTH.FIRST re-entrant with TOKENS.+LIST.FIRST.+FORM, so that lexical instantiation will fail against an incoming token feature structure that does not match in case. i have long been thinking this latter technique (as a type addendum on n_-_pn_le) could make a nice stepping stone towards a case-sensitive configuration of the ERG (which might give non-trivial efficiency gains on carefully edited text :-).</div><div dir="auto"><br></div><div dir="auto">best wishes, oe</div><div dir="auto"><br></div><div dir="auto"><br></div><div><div><div class="gmail_quote"><div dir="ltr">On Sat, 7 Jul 2018 at 21:21 <<a href="mailto:paul@haleyai.com" target="_blank">paul@haleyai.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear Developers,<br>
<br>
In one use case, it would be nice to limit the use of capitalized proper nouns to cases in which the input is capitalized. I have been successful in doing so with some exception, such as shown below.<br>
<br>
I am surprised by the following behavior and either have something to learn or perhaps there is a bug in PET's chart mapping?<br>
<br>
Regards,<br>
Paul<br>
<br>
<br>
Given a capitalized lexical entry such as:<br>
<br>
Bank_NNP := n_-_pn_le & [ORTH <"Bank">,SYNSEM [LKEYS.KEYREL.CARG "Bank",PHON.ONSET con]].<br>
<br>
The following lexical filtering rule (which has been simplified for the demonstration purposes of this email):<br>
<br>
veto_capitalized_native_uncapitalized_lfr := lexical_filtering_rule & [+CONTEXT <>,+INPUT <[ORTH.FIRST ^[[:upper:]].*$]>,+OUTPUT <>].<br>
<br>
will 'correctly' remove Bank_NNP from the chart when the input is "it is the bank" but fails to do so when a period is appended.<br>
<br>
PET's logging of lexical rules shows as follows for the first case:<br>
<br>
[cm] veto_capitalized_native_uncapitalized_lfr fired: I1:85 <br>
L [85 2-3 the_pn_np1_no (1) -0.1123 {} { : } {}] < blk: 2 dtrs: 50 parents: ><br>
[cm] veto_capitalized_native_uncapitalized_lfr fired: I1:92 <br>
L [92 3-4 Bank_NNP (1) 0 {} { : } {}] < blk: 2 dtrs: 51 parents: 98 ><br>
[cm] veto_capitalized_native_uncapitalized_lfr fired: I1:98 <br>
P [98 3-4 n_sg_ilr (1) 0 {} { : } {}] < blk: 2 dtrs: 92 parents: ><br>
<br>
Surprisingly, only the first of these 3 rules applies in the second case. <br>
<br>
I don't think it matters, but in our case, input is via FSC in which the period is a token. Thus, the following token mapping rule applies in the second case only:<br>
<br>
[cm] suffix_punctuation_tmr fired: C1:50 I1:48 O1:51 <br>
I [50 () -1--1 <14:15> "" "." { : } {}] < blk: 0 ><br>
I [48 () -1--1 <10:14> "" "bank" { : } {}] < blk: 2 ><br>
I [51 () -1--1 <10:15> "" "bank." { : } {}] < blk: 0 ><br>
<br>
A redacted AVM for the surviving lexical item follows. As far as I can tell, it matches the lexical filtering rule above and thus should not remain in the chart.<br>
<br>
<br>
L [103 3-4 Bank_NNP (1) 0 {} { : w_period_plr} {}] < blk: 0 dtrs: 63 parents: 110 ><br>
n_-_pn_le<br>
[ ...<br>
SYNSEM ...<br>
PHON phon<br>
[ ONSET con<br>
[ --TL #16:native_token_cons<br>
[ FIRST token<br>
[ +CLASS #17:alphabetic<br>
[ +CASE non_capitalized+lower,<br>
+INITIAL - ],<br>
+FROM #3,<br>
+FORM #18:"bank.",<br>
+TO "15",<br>
+CARG "bank",<br>
...<br>
REST native_token_null ] ] ],<br>
LKEYS lexkeys_norm<br>
[ KEYREL named_nom_relation<br>
[ CFROM #3,<br>
CTO #29:"15",<br>
PRED named_rel,<br>
LBL #15,<br>
LNK *list*,<br>
ARG0 #14,<br>
CARG "Bank" ],<br>
... <br>
ORTH orthography<br>
[ FIRST "Bank",<br>
REST *null*,<br>
FROM #3,<br>
CLASS #17,<br>
...<br>
TOKENS tokens<br>
[ +LIST #16,<br>
+LAST token<br>
[ +CLASS #17,<br>
+FROM "10",<br>
+FORM "bank.",<br>
+TO #29,<br>
+CARG "bank",<br>
...<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
</blockquote></div></div>
</div>
</div></blockquote></div><br></div></blockquote></div></div>