[developers] chart mapping missing applicable lexical filtering rule?
Stephan Oepen
oe at ifi.uio.no
Sun Jul 8 08:56:54 CEST 2018
i do believe that PET also injects computed ORTH values after applications
of orthographemic rules, but i do not have my laptop at hand to
double-check.
but i also believe it is likely to use the strings computed during
orthographemic segmentation, e.g. (sentence-initial) ‘Unbanking’ <- ‘un’ +
‘bank’ + ‘ing’. because the grammar only specifies lower-case %prefix()
and %suffix() rules, i suspect that everything may be downcased inside the
orthographemic machinery.
do ACE or Agree actually process prefixes and suffixes insensitive to case
but preserve original-case ORTH values? that would of course seem like the
right thing to do, lest we finally generalized the orthographemic
specification language to support actual regular expressions :-).
paul, with a little more sleep, i would like to refine my suggestion for
how to enforce capitalization in lexical entries: Rather than constrain the
+FORM value on token feature structures, i now recall i compute the +CASE
value during token mapping for exactly your purpose. a type addendum like
[ +CASE capitalized ] i would expect to do the trick.
best, oe
On Sun, 8 Jul 2018 at 07:42 Woodley Packard <sweaglesw at sweaglesw.org> wrote:
> Hi gentlemen,
>
> Not sure about other platforms, but I’m pretty sure (recent versions of)
> ACE computes the effect of orthographemic rules like the period in question
> and places the (non-unification-based) result in ORTH (or
> grammar-configured path) for the availability of further unification-based
> processing. Older versions of ACE (say, two years old or more?) do not do
> this, and leave the ORTH value as whatever the unification constraints
> supplied by the grammar dictate.
>
> Stephan, do I understand you to say you expect to see an uppercase ORTH
> before application of w_period_plr and a lowercase ORTH value after? That
> would seem surprising and unfortunate to me, if perhaps within the formal
> power of the system if the grammarian truly wanted it...
>
> Regards,
> Woodley
>
> On Jul 7, 2018, at 2:18 PM, Stephan Oepen <oe at ifi.uio.no> wrote:
>
> hi paul,
>
> lexical filtering applies after lexical parsing, i.e. you need to make
> sure your rule matches the complete lexical item—in the case where there is
> a trailing period, that will be an instance of the ’period‘ lexical rule
> with the ’bank‘ lexical entry as its daughter.
>
> not quite sure what the orthographemic machinery does about ORTH values,
> but i suspect that after the application of the ’period‘ the ORTH value may
> be either unset or (more likely) normalized to all lower case. upon the
> application of orthographemic (aka spelling-changing) rules, the ORTH value
> of the mother cannot just be determined by unification, e.g. a re-entrancy
> into the daughter (as is common for lexical rules that do not affect
> spelling).
>
> so, to make your current approach work, i think you would have to let the
> trigger rule detect proper names by a property other than ORTH.
>
> alternatively, you could try making ORTH.FIRST re-entrant with
> TOKENS.+LIST.FIRST.+FORM, so that lexical instantiation will fail against
> an incoming token feature structure that does not match in case. i have
> long been thinking this latter technique (as a type addendum on n_-_pn_le)
> could make a nice stepping stone towards a case-sensitive configuration of
> the ERG (which might give non-trivial efficiency gains on carefully edited
> text :-).
>
> best wishes, oe
>
>
> On Sat, 7 Jul 2018 at 21:21 <paul at haleyai.com> wrote:
>
>> Dear Developers,
>>
>> In one use case, it would be nice to limit the use of capitalized proper
>> nouns to cases in which the input is capitalized. I have been successful
>> in doing so with some exception, such as shown below.
>>
>> I am surprised by the following behavior and either have something to
>> learn or perhaps there is a bug in PET's chart mapping?
>>
>> Regards,
>> Paul
>>
>>
>> Given a capitalized lexical entry such as:
>>
>> Bank_NNP := n_-_pn_le & [ORTH <"Bank">,SYNSEM [LKEYS.KEYREL.CARG
>> "Bank",PHON.ONSET con]].
>>
>> The following lexical filtering rule (which has been simplified for the
>> demonstration purposes of this email):
>>
>> veto_capitalized_native_uncapitalized_lfr := lexical_filtering_rule
>> & [+CONTEXT <>,+INPUT <[ORTH.FIRST ^[[:upper:]].*$]>,+OUTPUT <>].
>>
>> will 'correctly' remove Bank_NNP from the chart when the input is "it is
>> the bank" but fails to do so when a period is appended.
>>
>> PET's logging of lexical rules shows as follows for the first case:
>>
>> [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:85
>> L [85 2-3 the_pn_np1_no (1) -0.1123 {} { : } {}] < blk: 2 dtrs: 50
>> parents: >
>> [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:92
>> L [92 3-4 Bank_NNP (1) 0 {} { : } {}] < blk: 2 dtrs: 51 parents:
>> 98 >
>> [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:98
>> P [98 3-4 n_sg_ilr (1) 0 {} { : } {}] < blk: 2 dtrs: 92 parents: >
>>
>> Surprisingly, only the first of these 3 rules applies in the second case.
>>
>> I don't think it matters, but in our case, input is via FSC in which the
>> period is a token. Thus, the following token mapping rule applies in the
>> second case only:
>>
>> [cm] suffix_punctuation_tmr fired: C1:50 I1:48 O1:51
>> I [50 () -1--1 <14:15> "" "." { : } {}] < blk: 0 >
>> I [48 () -1--1 <10:14> "" "bank" { : } {}] < blk: 2 >
>> I [51 () -1--1 <10:15> "" "bank." { : } {}] < blk: 0 >
>>
>> A redacted AVM for the surviving lexical item follows. As far as I can
>> tell, it matches the lexical filtering rule above and thus should not
>> remain in the chart.
>>
>>
>> L [103 3-4 Bank_NNP (1) 0 {} { : w_period_plr} {}] < blk: 0 dtrs: 63
>> parents: 110 >
>> n_-_pn_le
>> [ ...
>> SYNSEM ...
>> PHON phon
>> [ ONSET con
>> [ --TL #16:native_token_cons
>> [ FIRST token
>> [ +CLASS #17:alphabetic
>> [ +CASE
>> non_capitalized+lower,
>> +INITIAL -
>> ],
>> +FROM #3,
>> +FORM #18:"bank.",
>> +TO "15",
>> +CARG "bank",
>> ...
>> REST native_token_null ] ] ],
>> LKEYS lexkeys_norm
>> [ KEYREL named_nom_relation
>> [ CFROM #3,
>> CTO #29:"15",
>> PRED named_rel,
>> LBL #15,
>> LNK *list*,
>> ARG0 #14,
>> CARG "Bank" ],
>> ...
>> ORTH orthography
>> [ FIRST "Bank",
>> REST *null*,
>> FROM #3,
>> CLASS #17,
>> ...
>> TOKENS tokens
>> [ +LIST #16,
>> +LAST token
>> [ +CLASS #17,
>> +FROM "10",
>> +FORM "bank.",
>> +TO #29,
>> +CARG "bank",
>> ...
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20180708/cd1baf3d/attachment.html>
More information about the developers
mailing list