[developers] [itsdb] Problem with scores on my inflection rules

Fri Dec 7 21:07:37 CET 2012

hi tore,

this does not necessarily look like an [incr tsdb()] problem to me, so
i take the liberty of copying the larger ‘developers’ list.

seeing that, as i understand it, PET and ACE show the same results,
there must be something systematic :-).

you say you miss scores on the inflectional rules, but from what you
sent it appears the ‘*_irule’ nodes in those derivations have non-zero
scores, but the lexical entries (preterminals in the derivation) do not.
please correct me if i'm misunderstanding something.

the one surprising-looking thing in your MEM features is the upper
case in the forms of lexicalized features, e.g. "TIL".  i would have to
check the code, but one suspicion i have is that PET and ACE look
up these features with the surface form either all downcased, or in
exactly the variant used as the parser input (presumably not ‘TIL’ in
all upper case, in your inputs, either way).

i imagine you could test my suspicion by manually downcasing the
relevant strings in the MEM file.

why those forms are all upper case, on the other hand, i could not
easily guess; and that /could/ be an [incr tsdb()] issue.  what case
do you see in the derivations (in the ‘result’ relation) used for the
training of the model?  and which parser did you use to construct
those derivations (i would advise against the LKB in this case :-)?

best wishes, oe

On Fri, Dec 7, 2012 at 3:29 PM, Tore Bruland <torebrul at idi.ntnu.no> wrote:
> Hi.
>
> I am using the NorSource grammar and I have installed the LOGON tree on my
> 64-bit machine with Ubuntu 12.04.
> I have a problem with the scores on the inflection rules in the syntax trees
> from NorSource. If I use the ERG grammar and the redwood.mem file, the parse
> trees shows scores on the inflection rules.
> I have tried two test-suites. I have one small, only 13 sentences, and one
> bigger 1129 sentences. The scores are always 0 on the inflection rules.
>
> I train the model with the load script, dot.tsdbrc, and train.lisp from the
> redwoods folder. The files are adjusted to my files.
> I have created, parsed and annotated a small database in [tsdb++]. I run the
> same sentences with ACE and Pet, but the scores on the inflection rules are
> 0.
>
> One sentence is "gutten solgte bussen til mannen", and one tree (the
> selected tree from the annotation) parsed with ACE:
>
> (2692 head-subject-rule 6.716046 0 5
>  (197 sg_def_m_final-full_irule 0.844204 0 1
>   (157 sg-masc-def-noun-lxm-lrule 0.422102 0 1
>    (1 gutt_masc-reganim 0.000000 0 1
>     ("gutten" 453 "token "))))
>  (2250 head-verb-comp-rule 5.560148 1 5
>   (192 pret-nonfstr_infl_rule 1.145396 1 2
>    (7 selge_tv 0.000000 1 2
>     ("solgte" 570 "token ")))
>   (2235 pp-mod-defbare-n-index-sit-rule 2.977362 2 5
>    (222 sg_def_m_final-full_irule 1.466776 2 3
>     (194 sg-masc-def-noun-lxm-lrule 0.902894 2 3
>      (8 buss_masc-dirnoun 0.000000 2 3
>       ("bussen" 687 "token "))))
>    (1772 head-prep-comp-rule 0.740289 3 5
>     (24 til_poss 0.000000 3 4
>      ("til" 91 "token "))
>     (223 sg_def_m_final-full_irule 0.266936 4 5
>      (196 sg-masc-def-noun-lxm-lrule 0.030260 4 5
>       (30 mann_masc-reganim 0.000000 4 5
>        ("mannen" 64 "token "))))))))
>
> And parsed with Pet:
>
> (1883 head-subject-rule 1.103 0 5 [root]
>   (1874 sg_def_m_final-full_irule 9.041e-08 0 1
>     (1873 sg-masc-def-noun-lxm-lrule 8.934e-08 0 1
>       (6 gutt_masc-reganim/masc-reganim-noun-lxm 0 0 1
> [sg_def_m_final-full_irule]
>         (1 "gutten" 0 0 1 <0:1>))))
>   (1882 head-verb-comp-rule 0.9468 1 5
>     (1875 pret-nonfstr_infl_rule 0.2863 1 2
>       (18 selge_tv/v-tr 0 1 2 [pret-nonfstr_infl_rule]
>         (2 "solgte" 0 1 2 <1:2>)))
>     (1881 pp-mod-defbare-n-index-sit-rule 0.1813 2 5
>       (1877 sg_def_m_final-full_irule -5.485e-08 2 3
>         (1876 sg-masc-def-noun-lxm-lrule -5.593e-08 2 3
>           (31 buss_masc-dirnoun/masc-dir-noun-lxm 0 2 3
> [sg_def_m_final-full_irule]
>             (3 "bussen" 0 2 3 <2:3>))))
>       (1880 head-prep-comp-rule 0.03026 3 5
>         (36 til_poss/prep-word-poss 0 3 4 []
>           (4 "til" 0 3 4 <3:4>))
>         (1879 sg_def_m_final-full_irule 9.041e-08 4 5
>           (1878 sg-masc-def-noun-lxm-lrule 8.934e-08 4 5
>             (47 mann_masc-reganim/masc-reganim-noun-lxm 0 4 5
> [sg_def_m_final-full_irule]
>               (5 "mannen" 0 4 5 <4:5>))))))))
>
> One of the lexicon entries for the preposition "til":
>
> til_poss := prep-word-poss &
>   [ STEM < "til" >,
>     SYNSEM.LOCAL.CAT.HEAD [KEYS.KEY til-poss ] ].
>
> Suite_2.mem
> (30) [1 (3) head-verb-comp-rule pp-mod-defbare-n-index-sit-rule
> head-prep-comp-rule prep-word-poss "TIL"] 0.206416 {2 2 2 2} [0 1]
> (31) [1 (2) pp-mod-defbare-n-index-sit-rule head-prep-comp-rule
> prep-word-poss "TIL"] -0.0251066 {5 5 5 5} [0 1]
> (32) [1 (1) head-prep-comp-rule prep-word-poss "TIL"] -0.0251066 {5 5 5 5}
> [0 1]
> (33) [1 (0) prep-word-poss "TIL"] -0.0251066 {5 5 5 5} [0 1]
>
> I can’t see anything wrong here. "prep-word-poss" is relevant and it has a
> score.
> I need some help.
> My trained model is attached in the e-mail.
>
> Tore Bruland
> PhD Candidate
> Department of Computer and Information Science Norwegian University of
> Science and Technology (NTNU) Sem Sælands vei 7-9
> NO-7491 Trondheim, Norway
> tel. +47 73 59 36 72
> fax  +47 73 59 44 66
> cel. +47 47 90 49 79