[developers] Tanaka corpus logon parse ranker not creating output profile files
Francis Bond
fcbond at gmail.com
Sat Jun 27 02:33:50 CEST 2009
G'day,
2009/6/27 W.P. McNeill (UW) <billmcn at u.washington.edu>:
> I am running the logon parse ranking tools on the Japanese Tanaka corpus. I
> am able to successfully run a parse ranking experiment on individual
> profiles in the corpus, but when I try to run the same experiment on a
> virtual profile that includes the entire corpus it does not work because
> files are not created in the output profile. The contents of the profile
> look like this when experiment is complete.
> $ lrt \[tanaka-train\]\ GP\[0\]\ +PT\ -LEX\ CW\[\]\ -AE\ NS\[0\]\ NT\[\]\
> -NB\ LM\[0\]\ FT\[\:\:\:1\]\ RS\[\]\ MM\[tao_lmvm\]\ MI\[5000\]\
> RT\[1.0e-6\]\ AT\[1.0e-20\]\ VA\[1.0e+4\]\ PC\[100\]/
> total 289556
> -rw-r--r-- 1 billmcn billmcn 146 Jun 25 17:06 virtual
> -rw-r--r-- 1 billmcn billmcn 296203985 Jun 25 17:07 fc.mlm
> -rw-r--r-- 1 billmcn billmcn 0 Jun 25 17:07 fold
> -rw-r--r-- 1 billmcn billmcn 0 Jun 25 17:46 score
> I debugged into this and it appears that the cache parameter passed down to
> write-score() and write-fold() is incorrect. It looks like this when it is
> passed into write-score:
> (WRITE-SCORE
> "[tanaka-train] GP[0] +PT -LEX CW[] -AE NS[0] NT[] -NB LM[0] FT[:::1] RS[]
> MM[tao_lmvm] MI[5000] RT[1.0e-6] AT[1.0e-20] VA[1.0e+4] PC[100]"
> ((:SCORE . "0.0000421995000000") (:RANK . 1) (:RESULT-ID . 7)
> (:PARSE-ID . 9000))
> :CACHE ((:PROTOCOL . :RAW) (:COUNT . 0)
> (:DATABASE
> . "[tanaka-train] GP[0] +PT -LEX CW[] -AE NS[0] NT[] -NB LM[0] FT[:::1] RS[]
> MM[tao_lmvm] MI[5000] RT[1.0e-6] AT[1.0e-20] VA[1.0e+4] PC[100]")))
> Note that there are no open file handles as part of the cache data
> structure. As a result the fold and score information is written to STDOUT
> instead of the output files.
> The data structure shown above is the one that gets passed back from
> create-cache.
> If I run with the single Tanaka profile instead of the virtual profile I do
> see open file handles for all the files in the ranker output profile.
> Presumably there is something wrong with my setup of the Tanaka corpus, but
> I can't figure out what it is. The problematic virtual profile directory
> looks like this:
> $ lrt tanaka-train/
> total 289556
> -rw-r--r-- 1 billmcn billmcn 146 Feb 17 17:17 virtual
> -rw-r--r-- 1 billmcn billmcn 296203985 Jun 18 18:56 fc.mlm
> I'm still debugging, but I thought I would ask in case there's something
> obvious I'm missing.
> I don't have this problem with the English ranker experiments in the logon tree.
I am not the expert here, but perhaps you should check that you have a
virtual skeleton profile, as well as a virtual treebank. My
understanding is that you need both.
Yours,
--
Francis Bond <bond at ieee.org>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
More information about the developers
mailing list