[developers] Tranining a parse ranking model for Jacy

Thu Jan 12 06:12:32 CET 2012

Hello,

I am currently training two kinds of parse ranking models for Jacy and 
the ERG:
(a) on gold annotated profiles of the (Japanese-English, parallel) 
Tanaka corpus
(b) on Tanaka profiles which were treebanked automatically (using MRS 
alignment)

In both cases the results for Japanese are worse than expected. I train 
the same models on the same data in English for the ERG, and here 
everything seems to work fine. I use the 'load' script and the 
'train.lisp' script, which do both feature-caching and context-caching.

setting (a)
Training on gold annotated Tanaka profile 006, for Japanese only 1 
feature per sentence is extracted during feature caching, while for 
English the number looks reasonable. The model returned for Japanese is 
tiny compared to the English one, and performs very poorly, as expected.

Japanese:
[11:44:48] operate-on-profiles(): running `pet' [30009000 - 30009200|.
[11:44:48] open-fc(): new BDB `fc.bdb'.
[11:44:48] cache-features(): item # 30009000: 1 event;
[11:44:48] cache-features(): item # 30009004: 1 event;
[11:44:48] cache-features(): item # 30009006: 1 event;
[11:44:48] cache-features(): item # 30009007: 1 event;
[11:44:48] cache-features(): item # 30009008: 1 event;
...
Events in  = /tmp/.model.lfrermann.19628.events
Params out = /tmp/.model.lfrermann.19628.weights
Marginal   = pseudo-likelihood
Smoothing  = none
Procs      = 1
Classes    = 749
Contexts   = 715
Features   = 7 / 7
Non-zeros  = 2197

English:
[11:45:59] operate-on-profiles(): running `pet' [30009000 - 30009200|.
[11:46:00] open-fc(): new BDB `fc.bdb'.
[11:46:00] cache-features(): item # 30009000: 11 events;
[11:46:00] cache-features(): item # 30009001: 6 events;
[11:46:01] cache-features(): item # 30009002: 11 events;
[11:46:01] cache-features(): item # 30009003: 11 events;
[11:46:01] cache-features(): item # 30009004: 2 events;
...
Events in  = /tmp/.model.lfrermann.19803.events
Params out = /tmp/.model.lfrermann.19803.weights
Marginal   = pseudo-likelihood
Smoothing  = none
Procs      = 1
Classes    = 8739
Contexts   = 1147
Features   = 6595 / 8650
Non-zeros  = 667364

setting (b)
When I train ranking models one automatically treebanked profile, for 
both languages a reasonable number of parses is extracted (looking 
similar to the English output above), and the model sizes are comparable:

Japanese:
Events in  = /tmp/.model.lfrermann.20201.events
Params out = /tmp/.model.lfrermann.20201.weights
Marginal   = pseudo-likelihood
Smoothing  = none
Procs      = 1
Classes    = 4620
Contexts   = 495
Features   = 3432 / 4314
Non-zeros  = 421357

English:
Events in  = /tmp/.model.lfrermann.20140.events
Params out = /tmp/.model.lfrermann.20140.weights
Marginal   = pseudo-likelihood
Smoothing  = none
Procs      = 1
Classes    = 5067
Contexts   = 617
Features   = 4177 / 5234
Non-zeros  = 342117

When I parse a test profile for English and Japanese using the 
respective model, and compare the resulting ranks to the gold 
annotations, I get 50% accuracy for English, but only 39% accuracy for 
Japanese. The difference might be influenced by language specific 
differences in training and evaluation, but still it seems too big to me.

I'd be very grateful for any suggested solutions (especially considering 
the approaching ACL deadline (15.1.)).
Thank you very much for your help in advance!
Lea.