[developers] [itsdb] Problem with scores on my inflection rules

Mon Dec 10 11:22:15 CET 2012

Thanks, Stephan.

I replaced "TIL", "GUTT", and "SELGE" with a lowercase variant and now the preposition "TIL" works.
The remaining problem seems to be that the syntax tree contains the surface string and the mem-file contains the lexicon STEM. 

Mem-file:
(2) [1 (1) sg-masc-def-noun-lxm-lrule masc-reganim-noun-lxm "GUTT"] 1.32876e-9 {10 3 10 0} [0 1]
(3) [1 (0) masc-reganim-noun-lxm "GUTT"] 1.32876e-9 {10 3 10 0} [0 1] 

Lexicon entry:
selge_tv := v-tr &
 [ INFLECTION nonfstr-strong,
   STEM < "selge" >,
   SYNSEM.LKEYS.KEYREL.PRED "_selge_v_rel" ].

Syntax trees:
ACE
(2692 head-subject-rule 6.847143 0 5 
 (197 sg_def_m_final-full_irule 0.844204 0 1 
  (157 sg-masc-def-noun-lxm-lrule 0.422102 0 1 
   (1 gutt_masc-reganim 0.000000 0 1 
    ("gutten" 520 "token ")))) 
 (2250 head-verb-comp-rule 5.691245 1 5 
  (192 pret-nonfstr_infl_rule 1.145396 1 2 
   (7 selge_tv 0.000000 1 2 
    ("solgte" 637 "token "))) 
  (2235 pp-mod-defbare-n-index-sit-rule 3.108459 2 5 
   (222 sg_def_m_final-full_irule 1.466776 2 3 
    (194 sg-masc-def-noun-lxm-lrule 0.902894 2 3 
     (8 buss_masc-dirnoun 0.000000 2 3 
      ("bussen" 41 "token ")))) 
   (1772 head-prep-comp-rule 0.871385 3 5 
    (24 til_poss 0.131096 3 4 
     ("til" 158 "token ")) 
    (223 sg_def_m_final-full_irule 0.266936 4 5 
     (196 sg-masc-def-noun-lxm-lrule 0.030260 4 5 
      (30 mann_masc-reganim 0.000000 4 5 
       ("mannen" 131 "token "))))))))

PET
(1883 head-subject-rule 1.078 0 5 [root]
  (1874 sg_def_m_final-full_irule 9.041e-08 0 1
    (1873 sg-masc-def-noun-lxm-lrule 8.934e-08 0 1
      (6 gutt_masc-reganim/masc-reganim-noun-lxm 0 0 1 [sg_def_m_final-full_irule]
        (1 "gutten" 0 0 1 <0:1>))))
  (1882 head-verb-comp-rule 0.9217 1 5
    (1875 pret-nonfstr_infl_rule 0.2863 1 2
      (18 selge_tv/v-tr 0 1 2 [pret-nonfstr_infl_rule]
        (2 "solgte" 0 1 2 <1:2>)))
    (1881 pp-mod-defbare-n-index-sit-rule 0.1562 2 5
      (1877 sg_def_m_final-full_irule -5.485e-08 2 3
        (1876 sg-masc-def-noun-lxm-lrule -5.593e-08 2 3
          (31 buss_masc-dirnoun/masc-dir-noun-lxm 0 2 3 [sg_def_m_final-full_irule]
            (3 "bussen" 0 2 3 <2:3>))))
      (1880 head-prep-comp-rule 0.005154 3 5
        (36 til_poss/prep-word-poss -0.02511 3 4 []
          (4 "til" 0 3 4 <3:4>))
        (1879 sg_def_m_final-full_irule 9.041e-08 4 5
          (1878 sg-masc-def-noun-lxm-lrule 8.934e-08 4 5
            (47 mann_masc-reganim/masc-reganim-noun-lxm 0 4 5 [sg_def_m_final-full_irule]
              (5 "mannen" 0 4 5 <4:5>))))))))

The entry in my result file for our sentence:

1 at 0@380 at -1@-1 at -1@-1 at 604@-1 at -1@(290 head-subject-rule 0.0 0 5 (17 sg_def_m_final-full_irule 0.0 0 1 (16 sg-masc-def-noun-lxm-lrule 0.0 0 1 (15 gutt_masc-reganim 0.0 0 1 ("GUTT" 0 1)))) (289 head-verb-comp-rule 0.0 1 5 (176 pret-nonfstr_infl_rule 0.0 1 2 (175 selge_tv 0.0 1 2 ("SELGE" 1 2))) (288 pp-mod-defbare-n-index-sit-rule 0.0 2 5 (182 sg_def_m_final-full_irule 0.0 2 3 (181 sg-masc-def-noun-lxm-lrule 0.0 2 3 (180 buss_masc-dirnoun 0.0 2 3 ("BUSS" 2 3)))) (276 head-prep-comp-rule 0.0 3 5 (263 til_poss 0.0 3 4 ("TIL" 3 4)) (274 sg_def_m_final-full_irule 0.0 4 5 (273 sg-masc-def-noun-lxm-lrule 0.0 4 5 (272 mann_masc-reganim 0.0 4 5 ("MANN" 4 5))))))))@@("S" ("N" ("N" ("N" ("gutten")))) ("VP" ("V" ("V" ("solgte"))) ("NP" ("N" ("N" ("N" ("bussen")))) ("PP" ("PREP" ("til")) ("N" ("N" ("N" ("mannen"))))))))@[ LTOP: h1 [ h ROLE: ROLE ] INDEX: e2 [ e E.TENSE: PRET E.MOOD: INDICATIVE E.ASPECT: SEMSORT E.DELIMITED: + PATH-TELIC: BOOL SIT-TYPE: SEMSORT DISC-MOVE: DISCMODE SF: IFORCE WH: BOOL ROLE: ROLE ] RELS: < [ "_gutt_n_rel"<0:6> LBL: h3 [ h ROLE: ROLE ] ARG0: x4 [ x WH: - ROLE: ROLE PNG.NG.GEN: M PNG.NG.NUM: SING PNG.PERS: THIRDPERS BOUNDED: + ] ] [ "_def_q_rel"<0:6> LBL: h5 [ h ROLE: ROLE ] ARG0: x4 RSTR: h6 [ h ROLE: ROLE ] BODY: h7 [ h ROLE: ROLE ] ] [ "_selge_v_rel"<7:13> LBL: h8 [ h ROLE: ROLE ] ARG0: e2 ARG1: x4 ARG2: x9 [ x ROLE: ROLE BOUNDED: + WH: - PNG.NG.NUM: SING PNG.NG.GEN: M PNG.PERS: THIRDPERS ] ] [ "_buss_n_rel"<14:20> LBL: h10 [ h ROLE: ROLE ] ARG0: x9 ] [ "_def_q_rel"<14:20> LBL: h11 [ h ROLE: ROLE ] ARG0: x9 RSTR: h12 [ h ROLE: ROLE ] BODY: h13 [ h ROLE: ROLE ] ] [ "_possessed_by_rel"<21:24> LBL: h10 ARG0: u14 [ u WH: BOOL ROLE: ROLE ] ARG1: x9 ARG2: x15 [ x WH: - ROLE: ROLE PNG.NG.NUM: SING PNG.NG.GEN: M PNG.PERS: THIRDPERS BOUNDED: + ] ] [ "_mann_n_rel"<25:31> LBL: h16 [ h ROLE: ROLE ] ARG0: x15 ] [ "_def_q_rel"<25:31> LBL: h17 [ h ROLE: ROLE ] ARG0: x15 RSTR: h18 [ h ROLE: ROLE ] BODY: h19 [ h ROLE: ROLE ] ] > HCONS: < h6 qeq h3 h12 qeq h10 h18 qeq h16 > ]@((:ASCORE))

And yes, I used LKB for this task. 
I switched to ACE, and now the entry in my result file is:

1 at 0@-1 at -1@-1 at -1@-1 at -1@-1 at -1@(2707 head-subject-rule 3.723022 0 5 (197 sg_def_m_final-full_irule 0.798620 0 1 (157 sg-masc-def-noun-lxm-lrule 0.442091 0 1 (1 gutt_masc-reganim 0.000000 0 1 ("gutten" 495 "token [ +FORM \\"gutten\\" +FROM \\"0\\" +TO \\"6\\" +ID diff-list [ LIST list LAST list ] +TNT tnt [ +TAGS null +PRBS null +MAIN tnt_main [ +TAG string +PRB string ] ] +CLASS token_class +TRAIT token_trait +PRED predsort +CARG string +STAG tnt [ +TAGS list +PRBS list +MAIN tnt_main [ +TAG string +PRB string ] ] ]")))) (2248 head-actv-verb-icomp-rule 2.911072 1 5 (1812 head-verb-comp-rule 0.982212 1 3 (181 pret-nonfstr_infl_rule 0.193914 1 2 (5 selge_tr-obl-til 0.000000 1 2 ("solgte" 612 "token [ +FORM \\"solgte\\" +FROM \\"7\\" +TO \\"13\\" +ID diff-list [ LIST list LAST list ] +TNT tnt [ +TAGS null +PRBS null +MAIN tnt_main [ +TAG string +PRB string ] ] +CLASS token_class +TRAIT token_trait +PRED predsort +CARG string +STAG tnt [ +TAGS list +PRBS list +MAIN tnt_main [ +TAG string +PRB string ] ] ]"))) (222 sg_def_m_final-full_irule 0.788297 2 3 (194 sg-masc-def-noun-lxm-lrule 0.797056 2 3 (8 buss_masc-dirnoun 0.000000 2 3 ("bussen" 16 "token [ +FORM \\"bussen\\" +FROM \\"14\\" +TO \\"20\\" +ID diff-list [ LIST list LAST list ] +TNT tnt [ +TAGS null +PRBS null +MAIN tnt_main [ +TAG string +PRB string ] ] +CLASS token_class +TRAIT token_trait +PRED predsort +CARG string +STAG tnt [ +TAGS list +PRBS list +MAIN tnt_main [ +TAG string +PRB string ] ] ]"))))) (1745 head-prep-comp-rule 1.619551 3 5 (19 til_sel 0.000000 3 4 ("til" 133 "token [ +FORM \\"til\\" +FROM \\"21\\" +TO \\"24\\" +ID diff-list [ LIST list LAST list ] +TNT tnt [ +TAGS null +PRBS null +MAIN tnt_main [ +TAG string +PRB string ] ] +CLASS token_class +TRAIT token_trait +PRED predsort +CARG string +STAG tnt [ +TAGS list +PRBS list +MAIN tnt_main [ +TAG string +PRB string ] ] ]")) (223 sg_def_m_final-full_irule 1.209164 4 5 (196 sg-masc-def-noun-lxm-lrule 0.472618 4 5 (30 mann_masc-reganim 0.000000 4 5 ("mannen" 106 "token [ +FORM \\"mannen\\" +FROM \\"25\\" +TO \\"31\\" +ID diff-list [ LIST list LAST list ] +TNT tnt [ +TAGS null +PRBS null +MAIN tnt_main [ +TAG string +PRB string ] ] +CLASS token_class +TRAIT token_trait +PRED predsort +CARG string +STAG tnt [ +TAGS list +PRBS list +MAIN tnt_main [ +TAG string +PRB string ] ] ]")))))))@@@[ LTOP: h0 INDEX: e1 [ e SORT: verb-act-specification SF: prop E.TENSE: pret E.MOOD: indicative E.ASPECT: semsort E.DELIMITED: + ] RELS: < [ "_gutt_n_rel"<-1:-1> LBL: h3 ARG0: x2 [ x WH: - BOUNDED: + PNG.NG.NUM: sing PNG.NG.GEN: m PNG.PERS: thirdpers ] ] [ "_def_q_rel"<-1:-1> LBL: h4 ARG0: x2 RSTR: h5 BODY: h6 ] [ "_selge_v_rel"<-1:-1> LBL: h7 ARG0: e1 ARG1: x2 ARG2: x8 [ x WH: - BOUNDED: + PNG.NG.NUM: sing PNG.NG.GEN: m PNG.PERS: thirdpers ] ARGOBLQ: h9 ] [ "_buss_n_rel"<-1:-1> LBL: h10 ARG0: x8 ] [ "_def_q_rel"<-1:-1> LBL: h11 ARG0: x8 RSTR: h12 BODY: h13 ] [ "_til_p_rel"<-1:-1> LBL: h9 ARG0: u14 ARG1: e1 ARG2: x15 [ x WH: - BOUNDED: + PNG.NG.NUM: sing PNG.NG.GEN: m PNG.PERS: thirdpers ] ] [ "_mann_n_rel"<-1:-1> LBL: h16 ARG0: x15 ] [ "_def_q_rel"<-1:-1> LBL: h17 ARG0: x15 RSTR: h18 BODY: h19 ] > HCONS: < h5 qeq h3 h12 qeq h10 h18 qeq h16 > ]@((:ASCORE .3.723022))

So far, so good, but I can't annotate with ACE.
I start the logon tree in Emacs (with no parser) and I start the ACE parser with 
(tsdb::tsdb :cpu :no_ace_p :file "tsdb_no_ace_p_log.txt") from the Emacs window, where
no_ace_p is:

     (make-cpu 
      :host (short-site-name)
      :spawn "/home/tore/software/ace-0.9.10pre4/ace"
      :options '("-t" "-g" "/home/tore/software/ace-0.9.10pre4/norsource.dat")
      :class :no_ace_p :grammar "norsource" :name "norsource"
      :task '(:parse) 
      :wait 600
     )

But, when I start tree->annotate from the podium, I get errors from each sentence:

TSNLP(6): 
create-cache(): write-through mode for `suite_3'.
install-gc-strategy(): disabling tenure; global garbage collection ...
[10:51:20] gc-after-hook(): {L#27 N=3.0K O=179K E=96%} [S=2.2G R=52M].
 done.
[10:51:20] browse-tree(): `suite_3' --- item # 1
[10:51:20] browse-tree(): retrieved item # 1 (5 parses).
[10:51:20] browse-tree(): retrieved 0 tree records.
[10:51:20] browse-tree(): reconstructed 0 edges.
[10:51:20] browse-tree(): retrieved 0 decisions.
browse-tree(): failed to reconstruct item # 1 (parse # 1).

Is there another way to use ACE?
When I tried to annotate with LKB, I didn't get the correct parse trees.

Tore

> -----Original Message-----
> From: stephan.oepen at gmail.com [mailto:stephan.oepen at gmail.com] On
> Behalf Of Stephan Oepen
> Sent: 7. desember 2012 21:08
> To: Tore Bruland
> Cc: itsdb at delph-in.net; developers at delph-in.net
> Subject: Re: [itsdb] Problem with scores on my inflection rules
> 
> hi tore,
> 
> this does not necessarily look like an [incr tsdb()] problem to me, so
> i take the liberty of copying the larger ‘developers’ list.
> 
> seeing that, as i understand it, PET and ACE show the same results,
> there must be something systematic :-).
> 
> you say you miss scores on the inflectional rules, but from what you
> sent it appears the ‘*_irule’ nodes in those derivations have non-zero
> scores, but the lexical entries (preterminals in the derivation) do
> not.
> please correct me if i'm misunderstanding something.
> 
> the one surprising-looking thing in your MEM features is the upper case
> in the forms of lexicalized features, e.g. "TIL".  i would have to
> check the code, but one suspicion i have is that PET and ACE look up
> these features with the surface form either all downcased, or in
> exactly the variant used as the parser input (presumably not ‘TIL’ in
> all upper case, in your inputs, either way).
> 
> i imagine you could test my suspicion by manually downcasing the
> relevant strings in the MEM file.
> 
> why those forms are all upper case, on the other hand, i could not
> easily guess; and that /could/ be an [incr tsdb()] issue.  what case do
> you see in the derivations (in the ‘result’ relation) used for the
> training of the model?  and which parser did you use to construct those
> derivations (i would advise against the LKB in this case :-)?
> 
> best wishes, oe
> 
> 
> On Fri, Dec 7, 2012 at 3:29 PM, Tore Bruland <torebrul at idi.ntnu.no>
> wrote:
> > Hi.
> >
> > I am using the NorSource grammar and I have installed the LOGON tree
> > on my 64-bit machine with Ubuntu 12.04.
> > I have a problem with the scores on the inflection rules in the
> syntax
> > trees from NorSource. If I use the ERG grammar and the redwood.mem
> > file, the parse trees shows scores on the inflection rules.
> > I have tried two test-suites. I have one small, only 13 sentences,
> and
> > one bigger 1129 sentences. The scores are always 0 on the inflection
> rules.
> >
> > I train the model with the load script, dot.tsdbrc, and train.lisp
> > from the redwoods folder. The files are adjusted to my files.
> > I have created, parsed and annotated a small database in [tsdb++]. I
> > run the same sentences with ACE and Pet, but the scores on the
> > inflection rules are 0.
> >
> > One sentence is "gutten solgte bussen til mannen", and one tree (the
> > selected tree from the annotation) parsed with ACE:
> >
> > (2692 head-subject-rule 6.716046 0 5
> >  (197 sg_def_m_final-full_irule 0.844204 0 1
> >   (157 sg-masc-def-noun-lxm-lrule 0.422102 0 1
> >    (1 gutt_masc-reganim 0.000000 0 1
> >     ("gutten" 453 "token "))))
> >  (2250 head-verb-comp-rule 5.560148 1 5
> >   (192 pret-nonfstr_infl_rule 1.145396 1 2
> >    (7 selge_tv 0.000000 1 2
> >     ("solgte" 570 "token ")))
> >   (2235 pp-mod-defbare-n-index-sit-rule 2.977362 2 5
> >    (222 sg_def_m_final-full_irule 1.466776 2 3
> >     (194 sg-masc-def-noun-lxm-lrule 0.902894 2 3
> >      (8 buss_masc-dirnoun 0.000000 2 3
> >       ("bussen" 687 "token "))))
> >    (1772 head-prep-comp-rule 0.740289 3 5
> >     (24 til_poss 0.000000 3 4
> >      ("til" 91 "token "))
> >     (223 sg_def_m_final-full_irule 0.266936 4 5
> >      (196 sg-masc-def-noun-lxm-lrule 0.030260 4 5
> >       (30 mann_masc-reganim 0.000000 4 5
> >        ("mannen" 64 "token "))))))))
> >
> > And parsed with Pet:
> >
> > (1883 head-subject-rule 1.103 0 5 [root]
> >   (1874 sg_def_m_final-full_irule 9.041e-08 0 1
> >     (1873 sg-masc-def-noun-lxm-lrule 8.934e-08 0 1
> >       (6 gutt_masc-reganim/masc-reganim-noun-lxm 0 0 1
> > [sg_def_m_final-full_irule]
> >         (1 "gutten" 0 0 1 <0:1>))))
> >   (1882 head-verb-comp-rule 0.9468 1 5
> >     (1875 pret-nonfstr_infl_rule 0.2863 1 2
> >       (18 selge_tv/v-tr 0 1 2 [pret-nonfstr_infl_rule]
> >         (2 "solgte" 0 1 2 <1:2>)))
> >     (1881 pp-mod-defbare-n-index-sit-rule 0.1813 2 5
> >       (1877 sg_def_m_final-full_irule -5.485e-08 2 3
> >         (1876 sg-masc-def-noun-lxm-lrule -5.593e-08 2 3
> >           (31 buss_masc-dirnoun/masc-dir-noun-lxm 0 2 3
> > [sg_def_m_final-full_irule]
> >             (3 "bussen" 0 2 3 <2:3>))))
> >       (1880 head-prep-comp-rule 0.03026 3 5
> >         (36 til_poss/prep-word-poss 0 3 4 []
> >           (4 "til" 0 3 4 <3:4>))
> >         (1879 sg_def_m_final-full_irule 9.041e-08 4 5
> >           (1878 sg-masc-def-noun-lxm-lrule 8.934e-08 4 5
> >             (47 mann_masc-reganim/masc-reganim-noun-lxm 0 4 5
> > [sg_def_m_final-full_irule]
> >               (5 "mannen" 0 4 5 <4:5>))))))))
> >
> > One of the lexicon entries for the preposition "til":
> >
> > til_poss := prep-word-poss &
> >   [ STEM < "til" >,
> >     SYNSEM.LOCAL.CAT.HEAD [KEYS.KEY til-poss ] ].
> >
> > Suite_2.mem
> > (30) [1 (3) head-verb-comp-rule pp-mod-defbare-n-index-sit-rule
> > head-prep-comp-rule prep-word-poss "TIL"] 0.206416 {2 2 2 2} [0 1]
> > (31) [1 (2) pp-mod-defbare-n-index-sit-rule head-prep-comp-rule
> > prep-word-poss "TIL"] -0.0251066 {5 5 5 5} [0 1]
> > (32) [1 (1) head-prep-comp-rule prep-word-poss "TIL"] -0.0251066 {5 5
> > 5 5}
> > [0 1]
> > (33) [1 (0) prep-word-poss "TIL"] -0.0251066 {5 5 5 5} [0 1]
> >
> > I can’t see anything wrong here. "prep-word-poss" is relevant and it
> > has a score.
> > I need some help.
> > My trained model is attached in the e-mail.
> >
> > Tore Bruland
> > PhD Candidate
> > Department of Computer and Information Science Norwegian University
> of
> > Science and Technology (NTNU) Sem Sælands vei 7-9
> > NO-7491 Trondheim, Norway
> > tel. +47 73 59 36 72
> > fax  +47 73 59 44 66
> > cel. +47 47 90 49 79