[developers] Tanaka corpus logon parse ranker not creating output profile files

W.P. McNeill (UW) billmcn at u.washington.edu
Fri Jun 26 20:30:42 CEST 2009


I am running the logon parse ranking tools on the Japanese Tanaka corpus.  I
am able to successfully run a parse ranking experiment on individual
profiles in the corpus, but when I try to run the same experiment on a
virtual profile that includes the entire corpus it does not work because
files are not created in the output profile.  The contents of the profile
look like this when experiment is complete.
$ lrt \[tanaka-train\]\ GP\[0\]\ +PT\ -LEX\ CW\[\]\ -AE\ NS\[0\]\ NT\[\]\
-NB\ LM\[0\]\ FT\[\:\:\:1\]\ RS\[\]\ MM\[tao_lmvm\]\ MI\[5000\]\
RT\[1.0e-6\]\ AT\[1.0e-20\]\ VA\[1.0e+4\]\ PC\[100\]/
total 289556
-rw-r--r-- 1 billmcn billmcn       146 Jun 25 17:06 virtual
-rw-r--r-- 1 billmcn billmcn 296203985 Jun 25 17:07 fc.mlm
-rw-r--r-- 1 billmcn billmcn         0 Jun 25 17:07 fold
-rw-r--r-- 1 billmcn billmcn         0 Jun 25 17:46 score

I debugged into this and it appears that the cache parameter passed down to
write-score() and write-fold() is incorrect.  It looks like this when it is
passed into write-score:

(WRITE-SCORE
"[tanaka-train] GP[0] +PT -LEX CW[] -AE NS[0] NT[] -NB LM[0] FT[:::1] RS[]
MM[tao_lmvm] MI[5000] RT[1.0e-6] AT[1.0e-20] VA[1.0e+4] PC[100]"
((:SCORE . "0.0000421995000000") (:RANK . 1) (:RESULT-ID . 7)
(:PARSE-ID . 9000))
:CACHE ((:PROTOCOL . :RAW) (:COUNT . 0)
(:DATABASE
. "[tanaka-train] GP[0] +PT -LEX CW[] -AE NS[0] NT[] -NB LM[0] FT[:::1] RS[]
MM[tao_lmvm] MI[5000] RT[1.0e-6] AT[1.0e-20] VA[1.0e+4] PC[100]")))

Note that there are no open file handles as part of the cache data
structure.  As a result the fold and score information is written to STDOUT
instead of the output files.

The data structure shown above is the one that gets passed back from
create-cache.

If I run with the single Tanaka profile instead of the virtual profile I do
see open file handles for all the files in the ranker output profile.

Presumably there is something wrong with my setup of the Tanaka corpus, but
I can't figure out what it is.  The problematic virtual profile directory
looks like this:

$ lrt tanaka-train/
total 289556
-rw-r--r-- 1 billmcn billmcn       146 Feb 17 17:17 virtual
-rw-r--r-- 1 billmcn billmcn 296203985 Jun 18 18:56 fc.mlm

I'm still debugging, but I thought I would ask in case there's something
obvious I'm missing.

I don't have this problem with the English ranker experiments in the logon tree.
-- 
W.P. McNeill
http://staff.washington.edu/billmcn/index.shtml
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20090626/b83885ee/attachment.html>


More information about the developers mailing list