[developers] chart mapping missing applicable lexical filtering rule?

Stephan Oepen oe at ifi.uio.no
Sun Jul 8 08:56:54 CEST 2018


i do believe that PET also injects computed ORTH values after applications
of orthographemic rules, but i do not have my laptop at hand to
double-check.

but i also believe it is likely to use the strings computed during
orthographemic segmentation, e.g. (sentence-initial) ‘Unbanking’ <- ‘un’ +
‘bank’ + ‘ing’.  because the grammar only specifies lower-case %prefix()
and %suffix() rules, i suspect that everything may be downcased inside the
orthographemic machinery.

do ACE or Agree actually process prefixes and suffixes insensitive to case
but preserve original-case ORTH values?  that would of course seem like the
right thing to do, lest we finally generalized the orthographemic
specification language to support actual regular expressions :-).

paul, with a little more sleep, i would like to refine my suggestion for
how to enforce capitalization in lexical entries: Rather than constrain the
+FORM value on token feature structures, i now recall i compute the +CASE
value during token mapping for exactly your purpose.  a type addendum like
[ +CASE capitalized ] i would expect to do the trick.

best, oe


On Sun, 8 Jul 2018 at 07:42 Woodley Packard <sweaglesw at sweaglesw.org> wrote:

> Hi gentlemen,
>
> Not sure about other platforms, but I’m pretty sure (recent versions of)
> ACE computes the effect of orthographemic rules like the period in question
> and places the (non-unification-based) result in ORTH (or
> grammar-configured path) for the availability of further unification-based
> processing.  Older versions of ACE (say, two years old or more?) do not do
> this, and leave the ORTH value as whatever the unification constraints
> supplied by the grammar dictate.
>
> Stephan, do I understand you to say you expect to see an uppercase ORTH
> before application of w_period_plr and a lowercase ORTH value after?  That
> would seem surprising and unfortunate to me, if perhaps within the formal
> power of the system if the grammarian truly wanted it...
>
> Regards,
> Woodley
>
> On Jul 7, 2018, at 2:18 PM, Stephan Oepen <oe at ifi.uio.no> wrote:
>
> hi paul,
>
> lexical filtering applies after lexical parsing, i.e. you need to make
> sure your rule matches the complete lexical item—in the case where there is
> a trailing period, that will be an instance of the ’period‘ lexical rule
> with the ’bank‘ lexical entry as its daughter.
>
> not quite sure what the orthographemic machinery does about ORTH values,
> but i suspect that after the application of the ’period‘ the ORTH value may
> be either unset or (more likely) normalized to all lower case.  upon the
> application of orthographemic (aka spelling-changing) rules, the ORTH value
> of the mother cannot just be determined by unification, e.g. a re-entrancy
> into the daughter (as is common for lexical rules that do not affect
> spelling).
>
> so, to make your current approach work, i think you would have to let the
> trigger rule detect proper names by a property other than ORTH.
>
> alternatively, you could try making ORTH.FIRST re-entrant with
> TOKENS.+LIST.FIRST.+FORM, so that lexical instantiation will fail against
> an incoming token feature structure that does not match in case.  i have
> long been thinking this latter technique (as a type addendum on n_-_pn_le)
> could make a nice stepping stone towards a case-sensitive configuration of
> the ERG (which might give non-trivial efficiency gains on carefully edited
> text :-).
>
> best wishes, oe
>
>
> On Sat, 7 Jul 2018 at 21:21 <paul at haleyai.com> wrote:
>
>> Dear Developers,
>>
>> In one use case, it would be nice to limit the use of capitalized proper
>> nouns to cases in which the input is capitalized.  I have been successful
>> in doing so with some exception, such as shown below.
>>
>> I am surprised by the following behavior and either have something to
>> learn or perhaps there is a bug in PET's chart mapping?
>>
>> Regards,
>> Paul
>>
>>
>> Given a capitalized lexical entry such as:
>>
>>       Bank_NNP := n_-_pn_le & [ORTH <"Bank">,SYNSEM [LKEYS.KEYREL.CARG
>> "Bank",PHON.ONSET con]].
>>
>> The following lexical filtering rule (which has been simplified for the
>> demonstration purposes of this email):
>>
>>       veto_capitalized_native_uncapitalized_lfr := lexical_filtering_rule
>> & [+CONTEXT <>,+INPUT <[ORTH.FIRST ^[[:upper:]].*$]>,+OUTPUT <>].
>>
>> will 'correctly' remove Bank_NNP from the chart when the input is "it is
>> the bank" but fails to do so when a period is appended.
>>
>> PET's logging of lexical rules shows as follows for the first case:
>>
>>       [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:85
>>       L [85 2-3 the_pn_np1_no (1) -0.1123 {} { : } {}] < blk: 2 dtrs: 50
>> parents: >
>>       [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:92
>>       L [92 3-4 Bank_NNP (1) 0 {} { : } {}] < blk: 2 dtrs: 51  parents:
>> 98 >
>>       [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:98
>>       P [98 3-4 n_sg_ilr (1) 0 {} { : } {}] < blk: 2 dtrs: 92  parents: >
>>
>> Surprisingly, only the first of these 3 rules applies in the second case.
>>
>> I don't think it matters, but in our case, input is via FSC in which the
>> period is a token.  Thus, the following token mapping rule applies in the
>> second case only:
>>
>>     [cm] suffix_punctuation_tmr fired: C1:50 I1:48 O1:51
>>     I [50 () -1--1 <14:15> "" "." { : } {}] < blk: 0 >
>>     I [48 () -1--1 <10:14> "" "bank" { : } {}] < blk: 2 >
>>     I [51 () -1--1 <10:15> "" "bank." { : } {}] < blk: 0 >
>>
>> A redacted AVM for the surviving lexical item follows. As far as I can
>> tell, it matches the lexical filtering rule above and thus should not
>> remain in the chart.
>>
>>
>> L [103 3-4 Bank_NNP (1) 0 {} { : w_period_plr} {}] < blk: 0 dtrs: 63
>> parents: 110 >
>> n_-_pn_le
>> [ ...
>>   SYNSEM   ...
>>              PHON   phon
>>                     [ ONSET con
>>                             [ --TL #16:native_token_cons
>>                                        [ FIRST token
>>                                                [ +CLASS #17:alphabetic
>>                                                             [ +CASE
>> non_capitalized+lower,
>>                                                               +INITIAL -
>> ],
>>                                                  +FROM  #3,
>>                                                  +FORM  #18:"bank.",
>>                                                  +TO    "15",
>>                                                  +CARG  "bank",
>>                                                  ...
>>                                          REST  native_token_null ] ] ],
>>              LKEYS  lexkeys_norm
>>                     [ KEYREL    named_nom_relation
>>                                 [ CFROM #3,
>>                                   CTO   #29:"15",
>>                                   PRED  named_rel,
>>                                   LBL   #15,
>>                                   LNK   *list*,
>>                                   ARG0  #14,
>>                                   CARG  "Bank" ],
>>                                   ...
>>   ORTH     orthography
>>            [ FIRST "Bank",
>>              REST  *null*,
>>              FROM  #3,
>>              CLASS #17,
>>              ...
>>   TOKENS   tokens
>>            [ +LIST #16,
>>              +LAST token
>>                    [ +CLASS #17,
>>                      +FROM  "10",
>>                      +FORM  "bank.",
>>                      +TO    #29,
>>                      +CARG  "bank",
>>                      ...
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20180708/cd1baf3d/attachment.html>


More information about the developers mailing list