[developers] [itsdb] Problem with scores on my inflection rules
Tore Bruland
torebrul at idi.ntnu.no
Mon Dec 10 11:22:15 CET 2012
Thanks, Stephan.
I replaced "TIL", "GUTT", and "SELGE" with a lowercase variant and now the preposition "TIL" works.
The remaining problem seems to be that the syntax tree contains the surface string and the mem-file contains the lexicon STEM.
Mem-file:
(2) [1 (1) sg-masc-def-noun-lxm-lrule masc-reganim-noun-lxm "GUTT"] 1.32876e-9 {10 3 10 0} [0 1]
(3) [1 (0) masc-reganim-noun-lxm "GUTT"] 1.32876e-9 {10 3 10 0} [0 1]
Lexicon entry:
selge_tv := v-tr &
[ INFLECTION nonfstr-strong,
STEM < "selge" >,
SYNSEM.LKEYS.KEYREL.PRED "_selge_v_rel" ].
Syntax trees:
ACE
(2692 head-subject-rule 6.847143 0 5
(197 sg_def_m_final-full_irule 0.844204 0 1
(157 sg-masc-def-noun-lxm-lrule 0.422102 0 1
(1 gutt_masc-reganim 0.000000 0 1
("gutten" 520 "token "))))
(2250 head-verb-comp-rule 5.691245 1 5
(192 pret-nonfstr_infl_rule 1.145396 1 2
(7 selge_tv 0.000000 1 2
("solgte" 637 "token ")))
(2235 pp-mod-defbare-n-index-sit-rule 3.108459 2 5
(222 sg_def_m_final-full_irule 1.466776 2 3
(194 sg-masc-def-noun-lxm-lrule 0.902894 2 3
(8 buss_masc-dirnoun 0.000000 2 3
("bussen" 41 "token "))))
(1772 head-prep-comp-rule 0.871385 3 5
(24 til_poss 0.131096 3 4
("til" 158 "token "))
(223 sg_def_m_final-full_irule 0.266936 4 5
(196 sg-masc-def-noun-lxm-lrule 0.030260 4 5
(30 mann_masc-reganim 0.000000 4 5
("mannen" 131 "token "))))))))
PET
(1883 head-subject-rule 1.078 0 5 [root]
(1874 sg_def_m_final-full_irule 9.041e-08 0 1
(1873 sg-masc-def-noun-lxm-lrule 8.934e-08 0 1
(6 gutt_masc-reganim/masc-reganim-noun-lxm 0 0 1 [sg_def_m_final-full_irule]
(1 "gutten" 0 0 1 <0:1>))))
(1882 head-verb-comp-rule 0.9217 1 5
(1875 pret-nonfstr_infl_rule 0.2863 1 2
(18 selge_tv/v-tr 0 1 2 [pret-nonfstr_infl_rule]
(2 "solgte" 0 1 2 <1:2>)))
(1881 pp-mod-defbare-n-index-sit-rule 0.1562 2 5
(1877 sg_def_m_final-full_irule -5.485e-08 2 3
(1876 sg-masc-def-noun-lxm-lrule -5.593e-08 2 3
(31 buss_masc-dirnoun/masc-dir-noun-lxm 0 2 3 [sg_def_m_final-full_irule]
(3 "bussen" 0 2 3 <2:3>))))
(1880 head-prep-comp-rule 0.005154 3 5
(36 til_poss/prep-word-poss -0.02511 3 4 []
(4 "til" 0 3 4 <3:4>))
(1879 sg_def_m_final-full_irule 9.041e-08 4 5
(1878 sg-masc-def-noun-lxm-lrule 8.934e-08 4 5
(47 mann_masc-reganim/masc-reganim-noun-lxm 0 4 5 [sg_def_m_final-full_irule]
(5 "mannen" 0 4 5 <4:5>))))))))
The entry in my result file for our sentence:
1 at 0@380 at -1@-1 at -1@-1 at 604@-1 at -1@(290 head-subject-rule 0.0 0 5 (17 sg_def_m_final-full_irule 0.0 0 1 (16 sg-masc-def-noun-lxm-lrule 0.0 0 1 (15 gutt_masc-reganim 0.0 0 1 ("GUTT" 0 1)))) (289 head-verb-comp-rule 0.0 1 5 (176 pret-nonfstr_infl_rule 0.0 1 2 (175 selge_tv 0.0 1 2 ("SELGE" 1 2))) (288 pp-mod-defbare-n-index-sit-rule 0.0 2 5 (182 sg_def_m_final-full_irule 0.0 2 3 (181 sg-masc-def-noun-lxm-lrule 0.0 2 3 (180 buss_masc-dirnoun 0.0 2 3 ("BUSS" 2 3)))) (276 head-prep-comp-rule 0.0 3 5 (263 til_poss 0.0 3 4 ("TIL" 3 4)) (274 sg_def_m_final-full_irule 0.0 4 5 (273 sg-masc-def-noun-lxm-lrule 0.0 4 5 (272 mann_masc-reganim 0.0 4 5 ("MANN" 4 5))))))))@@("S" ("N" ("N" ("N" ("gutten")))) ("VP" ("V" ("V" ("solgte"))) ("NP" ("N" ("N" ("N" ("bussen")))) ("PP" ("PREP" ("til")) ("N" ("N" ("N" ("mannen"))))))))@[ LTOP: h1 [ h ROLE: ROLE ] INDEX: e2 [ e E.TENSE: PRET E.MOOD: INDICATIVE E.ASPECT: SEMSORT E.DELIMITED: + PATH-TELIC: BOOL SIT-TYPE: SEMSORT DISC-MOVE: DISCMODE SF: IFORCE WH: BOOL ROLE: ROLE ] RELS: < [ "_gutt_n_rel"<0:6> LBL: h3 [ h ROLE: ROLE ] ARG0: x4 [ x WH: - ROLE: ROLE PNG.NG.GEN: M PNG.NG.NUM: SING PNG.PERS: THIRDPERS BOUNDED: + ] ] [ "_def_q_rel"<0:6> LBL: h5 [ h ROLE: ROLE ] ARG0: x4 RSTR: h6 [ h ROLE: ROLE ] BODY: h7 [ h ROLE: ROLE ] ] [ "_selge_v_rel"<7:13> LBL: h8 [ h ROLE: ROLE ] ARG0: e2 ARG1: x4 ARG2: x9 [ x ROLE: ROLE BOUNDED: + WH: - PNG.NG.NUM: SING PNG.NG.GEN: M PNG.PERS: THIRDPERS ] ] [ "_buss_n_rel"<14:20> LBL: h10 [ h ROLE: ROLE ] ARG0: x9 ] [ "_def_q_rel"<14:20> LBL: h11 [ h ROLE: ROLE ] ARG0: x9 RSTR: h12 [ h ROLE: ROLE ] BODY: h13 [ h ROLE: ROLE ] ] [ "_possessed_by_rel"<21:24> LBL: h10 ARG0: u14 [ u WH: BOOL ROLE: ROLE ] ARG1: x9 ARG2: x15 [ x WH: - ROLE: ROLE PNG.NG.NUM: SING PNG.NG.GEN: M PNG.PERS: THIRDPERS BOUNDED: + ] ] [ "_mann_n_rel"<25:31> LBL: h16 [ h ROLE: ROLE ] ARG0: x15 ] [ "_def_q_rel"<25:31> LBL: h17 [ h ROLE: ROLE ] ARG0: x15 RSTR: h18 [ h ROLE: ROLE ] BODY: h19 [ h ROLE: ROLE ] ] > HCONS: < h6 qeq h3 h12 qeq h10 h18 qeq h16 > ]@((:ASCORE))
And yes, I used LKB for this task.
I switched to ACE, and now the entry in my result file is:
1 at 0@-1 at -1@-1 at -1@-1 at -1@-1 at -1@(2707 head-subject-rule 3.723022 0 5 (197 sg_def_m_final-full_irule 0.798620 0 1 (157 sg-masc-def-noun-lxm-lrule 0.442091 0 1 (1 gutt_masc-reganim 0.000000 0 1 ("gutten" 495 "token [ +FORM \\"gutten\\" +FROM \\"0\\" +TO \\"6\\" +ID diff-list [ LIST list LAST list ] +TNT tnt [ +TAGS null +PRBS null +MAIN tnt_main [ +TAG string +PRB string ] ] +CLASS token_class +TRAIT token_trait +PRED predsort +CARG string +STAG tnt [ +TAGS list +PRBS list +MAIN tnt_main [ +TAG string +PRB string ] ] ]")))) (2248 head-actv-verb-icomp-rule 2.911072 1 5 (1812 head-verb-comp-rule 0.982212 1 3 (181 pret-nonfstr_infl_rule 0.193914 1 2 (5 selge_tr-obl-til 0.000000 1 2 ("solgte" 612 "token [ +FORM \\"solgte\\" +FROM \\"7\\" +TO \\"13\\" +ID diff-list [ LIST list LAST list ] +TNT tnt [ +TAGS null +PRBS null +MAIN tnt_main [ +TAG string +PRB string ] ] +CLASS token_class +TRAIT token_trait +PRED predsort +CARG string +STAG tnt [ +TAGS list +PRBS list +MAIN tnt_main [ +TAG string +PRB string ] ] ]"))) (222 sg_def_m_final-full_irule 0.788297 2 3 (194 sg-masc-def-noun-lxm-lrule 0.797056 2 3 (8 buss_masc-dirnoun 0.000000 2 3 ("bussen" 16 "token [ +FORM \\"bussen\\" +FROM \\"14\\" +TO \\"20\\" +ID diff-list [ LIST list LAST list ] +TNT tnt [ +TAGS null +PRBS null +MAIN tnt_main [ +TAG string +PRB string ] ] +CLASS token_class +TRAIT token_trait +PRED predsort +CARG string +STAG tnt [ +TAGS list +PRBS list +MAIN tnt_main [ +TAG string +PRB string ] ] ]"))))) (1745 head-prep-comp-rule 1.619551 3 5 (19 til_sel 0.000000 3 4 ("til" 133 "token [ +FORM \\"til\\" +FROM \\"21\\" +TO \\"24\\" +ID diff-list [ LIST list LAST list ] +TNT tnt [ +TAGS null +PRBS null +MAIN tnt_main [ +TAG string +PRB string ] ] +CLASS token_class +TRAIT token_trait +PRED predsort +CARG string +STAG tnt [ +TAGS list +PRBS list +MAIN tnt_main [ +TAG string +PRB string ] ] ]")) (223 sg_def_m_final-full_irule 1.209164 4 5 (196 sg-masc-def-noun-lxm-lrule 0.472618 4 5 (30 mann_masc-reganim 0.000000 4 5 ("mannen" 106 "token [ +FORM \\"mannen\\" +FROM \\"25\\" +TO \\"31\\" +ID diff-list [ LIST list LAST list ] +TNT tnt [ +TAGS null +PRBS null +MAIN tnt_main [ +TAG string +PRB string ] ] +CLASS token_class +TRAIT token_trait +PRED predsort +CARG string +STAG tnt [ +TAGS list +PRBS list +MAIN tnt_main [ +TAG string +PRB string ] ] ]")))))))@@@[ LTOP: h0 INDEX: e1 [ e SORT: verb-act-specification SF: prop E.TENSE: pret E.MOOD: indicative E.ASPECT: semsort E.DELIMITED: + ] RELS: < [ "_gutt_n_rel"<-1:-1> LBL: h3 ARG0: x2 [ x WH: - BOUNDED: + PNG.NG.NUM: sing PNG.NG.GEN: m PNG.PERS: thirdpers ] ] [ "_def_q_rel"<-1:-1> LBL: h4 ARG0: x2 RSTR: h5 BODY: h6 ] [ "_selge_v_rel"<-1:-1> LBL: h7 ARG0: e1 ARG1: x2 ARG2: x8 [ x WH: - BOUNDED: + PNG.NG.NUM: sing PNG.NG.GEN: m PNG.PERS: thirdpers ] ARGOBLQ: h9 ] [ "_buss_n_rel"<-1:-1> LBL: h10 ARG0: x8 ] [ "_def_q_rel"<-1:-1> LBL: h11 ARG0: x8 RSTR: h12 BODY: h13 ] [ "_til_p_rel"<-1:-1> LBL: h9 ARG0: u14 ARG1: e1 ARG2: x15 [ x WH: - BOUNDED: + PNG.NG.NUM: sing PNG.NG.GEN: m PNG.PERS: thirdpers ] ] [ "_mann_n_rel"<-1:-1> LBL: h16 ARG0: x15 ] [ "_def_q_rel"<-1:-1> LBL: h17 ARG0: x15 RSTR: h18 BODY: h19 ] > HCONS: < h5 qeq h3 h12 qeq h10 h18 qeq h16 > ]@((:ASCORE .3.723022))
So far, so good, but I can't annotate with ACE.
I start the logon tree in Emacs (with no parser) and I start the ACE parser with
(tsdb::tsdb :cpu :no_ace_p :file "tsdb_no_ace_p_log.txt") from the Emacs window, where
no_ace_p is:
(make-cpu
:host (short-site-name)
:spawn "/home/tore/software/ace-0.9.10pre4/ace"
:options '("-t" "-g" "/home/tore/software/ace-0.9.10pre4/norsource.dat")
:class :no_ace_p :grammar "norsource" :name "norsource"
:task '(:parse)
:wait 600
)
But, when I start tree->annotate from the podium, I get errors from each sentence:
TSNLP(6):
create-cache(): write-through mode for `suite_3'.
install-gc-strategy(): disabling tenure; global garbage collection ...
[10:51:20] gc-after-hook(): {L#27 N=3.0K O=179K E=96%} [S=2.2G R=52M].
done.
[10:51:20] browse-tree(): `suite_3' --- item # 1
[10:51:20] browse-tree(): retrieved item # 1 (5 parses).
[10:51:20] browse-tree(): retrieved 0 tree records.
[10:51:20] browse-tree(): reconstructed 0 edges.
[10:51:20] browse-tree(): retrieved 0 decisions.
browse-tree(): failed to reconstruct item # 1 (parse # 1).
Is there another way to use ACE?
When I tried to annotate with LKB, I didn't get the correct parse trees.
Tore
> -----Original Message-----
> From: stephan.oepen at gmail.com [mailto:stephan.oepen at gmail.com] On
> Behalf Of Stephan Oepen
> Sent: 7. desember 2012 21:08
> To: Tore Bruland
> Cc: itsdb at delph-in.net; developers at delph-in.net
> Subject: Re: [itsdb] Problem with scores on my inflection rules
>
> hi tore,
>
> this does not necessarily look like an [incr tsdb()] problem to me, so
> i take the liberty of copying the larger ‘developers’ list.
>
> seeing that, as i understand it, PET and ACE show the same results,
> there must be something systematic :-).
>
> you say you miss scores on the inflectional rules, but from what you
> sent it appears the ‘*_irule’ nodes in those derivations have non-zero
> scores, but the lexical entries (preterminals in the derivation) do
> not.
> please correct me if i'm misunderstanding something.
>
> the one surprising-looking thing in your MEM features is the upper case
> in the forms of lexicalized features, e.g. "TIL". i would have to
> check the code, but one suspicion i have is that PET and ACE look up
> these features with the surface form either all downcased, or in
> exactly the variant used as the parser input (presumably not ‘TIL’ in
> all upper case, in your inputs, either way).
>
> i imagine you could test my suspicion by manually downcasing the
> relevant strings in the MEM file.
>
> why those forms are all upper case, on the other hand, i could not
> easily guess; and that /could/ be an [incr tsdb()] issue. what case do
> you see in the derivations (in the ‘result’ relation) used for the
> training of the model? and which parser did you use to construct those
> derivations (i would advise against the LKB in this case :-)?
>
> best wishes, oe
>
>
> On Fri, Dec 7, 2012 at 3:29 PM, Tore Bruland <torebrul at idi.ntnu.no>
> wrote:
> > Hi.
> >
> > I am using the NorSource grammar and I have installed the LOGON tree
> > on my 64-bit machine with Ubuntu 12.04.
> > I have a problem with the scores on the inflection rules in the
> syntax
> > trees from NorSource. If I use the ERG grammar and the redwood.mem
> > file, the parse trees shows scores on the inflection rules.
> > I have tried two test-suites. I have one small, only 13 sentences,
> and
> > one bigger 1129 sentences. The scores are always 0 on the inflection
> rules.
> >
> > I train the model with the load script, dot.tsdbrc, and train.lisp
> > from the redwoods folder. The files are adjusted to my files.
> > I have created, parsed and annotated a small database in [tsdb++]. I
> > run the same sentences with ACE and Pet, but the scores on the
> > inflection rules are 0.
> >
> > One sentence is "gutten solgte bussen til mannen", and one tree (the
> > selected tree from the annotation) parsed with ACE:
> >
> > (2692 head-subject-rule 6.716046 0 5
> > (197 sg_def_m_final-full_irule 0.844204 0 1
> > (157 sg-masc-def-noun-lxm-lrule 0.422102 0 1
> > (1 gutt_masc-reganim 0.000000 0 1
> > ("gutten" 453 "token "))))
> > (2250 head-verb-comp-rule 5.560148 1 5
> > (192 pret-nonfstr_infl_rule 1.145396 1 2
> > (7 selge_tv 0.000000 1 2
> > ("solgte" 570 "token ")))
> > (2235 pp-mod-defbare-n-index-sit-rule 2.977362 2 5
> > (222 sg_def_m_final-full_irule 1.466776 2 3
> > (194 sg-masc-def-noun-lxm-lrule 0.902894 2 3
> > (8 buss_masc-dirnoun 0.000000 2 3
> > ("bussen" 687 "token "))))
> > (1772 head-prep-comp-rule 0.740289 3 5
> > (24 til_poss 0.000000 3 4
> > ("til" 91 "token "))
> > (223 sg_def_m_final-full_irule 0.266936 4 5
> > (196 sg-masc-def-noun-lxm-lrule 0.030260 4 5
> > (30 mann_masc-reganim 0.000000 4 5
> > ("mannen" 64 "token "))))))))
> >
> > And parsed with Pet:
> >
> > (1883 head-subject-rule 1.103 0 5 [root]
> > (1874 sg_def_m_final-full_irule 9.041e-08 0 1
> > (1873 sg-masc-def-noun-lxm-lrule 8.934e-08 0 1
> > (6 gutt_masc-reganim/masc-reganim-noun-lxm 0 0 1
> > [sg_def_m_final-full_irule]
> > (1 "gutten" 0 0 1 <0:1>))))
> > (1882 head-verb-comp-rule 0.9468 1 5
> > (1875 pret-nonfstr_infl_rule 0.2863 1 2
> > (18 selge_tv/v-tr 0 1 2 [pret-nonfstr_infl_rule]
> > (2 "solgte" 0 1 2 <1:2>)))
> > (1881 pp-mod-defbare-n-index-sit-rule 0.1813 2 5
> > (1877 sg_def_m_final-full_irule -5.485e-08 2 3
> > (1876 sg-masc-def-noun-lxm-lrule -5.593e-08 2 3
> > (31 buss_masc-dirnoun/masc-dir-noun-lxm 0 2 3
> > [sg_def_m_final-full_irule]
> > (3 "bussen" 0 2 3 <2:3>))))
> > (1880 head-prep-comp-rule 0.03026 3 5
> > (36 til_poss/prep-word-poss 0 3 4 []
> > (4 "til" 0 3 4 <3:4>))
> > (1879 sg_def_m_final-full_irule 9.041e-08 4 5
> > (1878 sg-masc-def-noun-lxm-lrule 8.934e-08 4 5
> > (47 mann_masc-reganim/masc-reganim-noun-lxm 0 4 5
> > [sg_def_m_final-full_irule]
> > (5 "mannen" 0 4 5 <4:5>))))))))
> >
> > One of the lexicon entries for the preposition "til":
> >
> > til_poss := prep-word-poss &
> > [ STEM < "til" >,
> > SYNSEM.LOCAL.CAT.HEAD [KEYS.KEY til-poss ] ].
> >
> > Suite_2.mem
> > (30) [1 (3) head-verb-comp-rule pp-mod-defbare-n-index-sit-rule
> > head-prep-comp-rule prep-word-poss "TIL"] 0.206416 {2 2 2 2} [0 1]
> > (31) [1 (2) pp-mod-defbare-n-index-sit-rule head-prep-comp-rule
> > prep-word-poss "TIL"] -0.0251066 {5 5 5 5} [0 1]
> > (32) [1 (1) head-prep-comp-rule prep-word-poss "TIL"] -0.0251066 {5 5
> > 5 5}
> > [0 1]
> > (33) [1 (0) prep-word-poss "TIL"] -0.0251066 {5 5 5 5} [0 1]
> >
> > I can’t see anything wrong here. "prep-word-poss" is relevant and it
> > has a score.
> > I need some help.
> > My trained model is attached in the e-mail.
> >
> > Tore Bruland
> > PhD Candidate
> > Department of Computer and Information Science Norwegian University
> of
> > Science and Technology (NTNU) Sem Sælands vei 7-9
> > NO-7491 Trondheim, Norway
> > tel. +47 73 59 36 72
> > fax +47 73 59 44 66
> > cel. +47 47 90 49 79
More information about the developers
mailing list