[developers] Cheap inconsistency
Francis Bond
fcbond at gmail.com
Thu Dec 28 15:59:00 CET 2006
G'day,
I just noticed today that recent versions of cheap are failing to
parse input for JACY that (a) can be parsed by the lkb and (b) can be
parsed with older versions of cheap (confirmed with 0.99.7). Sorry I
didn't pick this up earlier.
I have filed a bug in the pet tracker <https://pet.opendfki.de/ticket/2>,
but thought I would also post it here to get a wider audience, as (a)
pet/lkb inconsitencies may worry more people than just me and (b) I am
keen to get a fix.
I would be happy to give anyone a copy of a grammar that triggers
this, but it is about 7M, so it may be more polite to put it
somewhere. I believe the grammar bundled with the hog will show the
same problem.
Please ask me for more data if it would help.
Sample data that triggers the problem:
お 犬
助け 守る
Test case:
cheap japanese
reading `pet/japanese.set'...
loading `japanese.grm' (Jacy (2005-11-20a)) reading ME model
`LXD-DEF-6.jp051120.mem'... [60508 features]
136872 types in 3.3 s
お 犬
(1) `お 犬' [0] --- 0 (-0.00|0.00s) <6:11> (93.7K) [0.0s]
In the LKB:
(lkb::do-parse-tty "お 犬 。")
Edge number 16
(FRG (NP (NP (PREFIX (お)) (N (犬)))))
お 犬 。
[ LTOP: h1
INDEX: e2 [ e E.TENSE: TENSE E.ASPECT: ASPECT E.MOOD: MOOD E.PASS:
BOOL SORT: SEMSORT ]
RELS: <
[ proposition_m_rel<-1:-1>
LBL: h1
ARG0: e2
MARG: h3 ]
[ unknown_rel<-1:-1>
LBL: h4
ARG0: e2
ARG: x5 [ x SORT: SEMSORT PNG.PN: THREE PNG.GEN: GENDER ] ]
[ "_inu_n_rel"<-1:-1>
LBL: h6
ARG0: x5 ]
[ udef_rel<-1:-1>
LBL: h7
ARG0: x5
RSTR: h8
BODY: h9 ] >
HCONS: < h3 qeq h4 h8 qeq h6 > ]
333
15
-1
233
3
LKB(21):
> chart dump:
0-1 [6] U-END => (お) [4]
0-1 [7] U_1_2 => (お) [4]
0-1 [8] O-HON-NPREF => (お) [3]
0-1 [9] O-HON-VNPREF => (お) [3]
0-1 [10] O-HON-NAPREF => (お) [3]
0-2 [14] PREFIX-ATTACH-RULE => (お 犬) [8 11]
0-2 [15] QUANTIFY-N-INFL-RULE => (お 犬) [14]
0-2 [16] NP-FRAG => (お 犬) [15]
1-2 [11] INU-NOUN => (犬) [5]
1-2 [12] QUANTIFY-N-INFL-RULE => (犬) [11]
1-2 [13] NP-FRAG => (犬) [12]
I wish you all a happy and bug-free new year.
--
Francis Bond <http://www2.nict.go.jp/x/x161/en/member/bond/>
NICT Computational Linguistics Group
More information about the developers
mailing list