[developers] chart mapping missing applicable lexical filtering rule?

Stephan Oepen oe at ifi.uio.no
Sat Jul 7 23:18:58 CEST 2018


hi paul,

lexical filtering applies after lexical parsing, i.e. you need to make sure
your rule matches the complete lexical item—in the case where there is a
trailing period, that will be an instance of the ’period‘ lexical rule with
the ’bank‘ lexical entry as its daughter.

not quite sure what the orthographemic machinery does about ORTH values,
but i suspect that after the application of the ’period‘ the ORTH value may
be either unset or (more likely) normalized to all lower case.  upon the
application of orthographemic (aka spelling-changing) rules, the ORTH value
of the mother cannot just be determined by unification, e.g. a re-entrancy
into the daughter (as is common for lexical rules that do not affect
spelling).

so, to make your current approach work, i think you would have to let the
trigger rule detect proper names by a property other than ORTH.

alternatively, you could try making ORTH.FIRST re-entrant with
TOKENS.+LIST.FIRST.+FORM, so that lexical instantiation will fail against
an incoming token feature structure that does not match in case.  i have
long been thinking this latter technique (as a type addendum on n_-_pn_le)
could make a nice stepping stone towards a case-sensitive configuration of
the ERG (which might give non-trivial efficiency gains on carefully edited
text :-).

best wishes, oe


On Sat, 7 Jul 2018 at 21:21 <paul at haleyai.com> wrote:

> Dear Developers,
>
> In one use case, it would be nice to limit the use of capitalized proper
> nouns to cases in which the input is capitalized.  I have been successful
> in doing so with some exception, such as shown below.
>
> I am surprised by the following behavior and either have something to
> learn or perhaps there is a bug in PET's chart mapping?
>
> Regards,
> Paul
>
>
> Given a capitalized lexical entry such as:
>
>       Bank_NNP := n_-_pn_le & [ORTH <"Bank">,SYNSEM [LKEYS.KEYREL.CARG
> "Bank",PHON.ONSET con]].
>
> The following lexical filtering rule (which has been simplified for the
> demonstration purposes of this email):
>
>       veto_capitalized_native_uncapitalized_lfr := lexical_filtering_rule
> & [+CONTEXT <>,+INPUT <[ORTH.FIRST ^[[:upper:]].*$]>,+OUTPUT <>].
>
> will 'correctly' remove Bank_NNP from the chart when the input is "it is
> the bank" but fails to do so when a period is appended.
>
> PET's logging of lexical rules shows as follows for the first case:
>
>       [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:85
>       L [85 2-3 the_pn_np1_no (1) -0.1123 {} { : } {}] < blk: 2 dtrs: 50
> parents: >
>       [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:92
>       L [92 3-4 Bank_NNP (1) 0 {} { : } {}] < blk: 2 dtrs: 51  parents: 98
> >
>       [cm] veto_capitalized_native_uncapitalized_lfr fired: I1:98
>       P [98 3-4 n_sg_ilr (1) 0 {} { : } {}] < blk: 2 dtrs: 92  parents: >
>
> Surprisingly, only the first of these 3 rules applies in the second case.
>
> I don't think it matters, but in our case, input is via FSC in which the
> period is a token.  Thus, the following token mapping rule applies in the
> second case only:
>
>     [cm] suffix_punctuation_tmr fired: C1:50 I1:48 O1:51
>     I [50 () -1--1 <14:15> "" "." { : } {}] < blk: 0 >
>     I [48 () -1--1 <10:14> "" "bank" { : } {}] < blk: 2 >
>     I [51 () -1--1 <10:15> "" "bank." { : } {}] < blk: 0 >
>
> A redacted AVM for the surviving lexical item follows. As far as I can
> tell, it matches the lexical filtering rule above and thus should not
> remain in the chart.
>
>
> L [103 3-4 Bank_NNP (1) 0 {} { : w_period_plr} {}] < blk: 0 dtrs: 63
> parents: 110 >
> n_-_pn_le
> [ ...
>   SYNSEM   ...
>              PHON   phon
>                     [ ONSET con
>                             [ --TL #16:native_token_cons
>                                        [ FIRST token
>                                                [ +CLASS #17:alphabetic
>                                                             [ +CASE
> non_capitalized+lower,
>                                                               +INITIAL - ],
>                                                  +FROM  #3,
>                                                  +FORM  #18:"bank.",
>                                                  +TO    "15",
>                                                  +CARG  "bank",
>                                                  ...
>                                          REST  native_token_null ] ] ],
>              LKEYS  lexkeys_norm
>                     [ KEYREL    named_nom_relation
>                                 [ CFROM #3,
>                                   CTO   #29:"15",
>                                   PRED  named_rel,
>                                   LBL   #15,
>                                   LNK   *list*,
>                                   ARG0  #14,
>                                   CARG  "Bank" ],
>                                   ...
>   ORTH     orthography
>            [ FIRST "Bank",
>              REST  *null*,
>              FROM  #3,
>              CLASS #17,
>              ...
>   TOKENS   tokens
>            [ +LIST #16,
>              +LAST token
>                    [ +CLASS #17,
>                      +FROM  "10",
>                      +FORM  "bank.",
>                      +TO    #29,
>                      +CARG  "bank",
>                      ...
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20180707/7065c8ad/attachment.html>


More information about the developers mailing list