I am running the logon parse ranking tools on the Japanese Tanaka corpus. I am able to successfully run a parse ranking experiment on individual profiles in the corpus, but when I try to run the same experiment on a virtual profile that includes the entire corpus it does not work because files are not created in the output profile. The contents of the profile look like this when experiment is complete.<div>
<br></div><div><div>$ lrt \[tanaka-train\]\ GP\[0\]\ +PT\ -LEX\ CW\[\]\ -AE\ NS\[0\]\ NT\[\]\ -NB\ LM\[0\]\ FT\[\:\:\:1\]\ RS\[\]\ MM\[tao_lmvm\]\ MI\[5000\]\ RT\[1.0e-6\]\ AT\[1.0e-20\]\ VA\[1.0e+4\]\ PC\[100\]/</div><div>
total 289556</div><div>-rw-r--r-- 1 billmcn billmcn 146 Jun 25 17:06 virtual</div><div>-rw-r--r-- 1 billmcn billmcn 296203985 Jun 25 17:07 fc.mlm</div><div>-rw-r--r-- 1 billmcn billmcn 0 Jun 25 17:07 fold</div>
<div>-rw-r--r-- 1 billmcn billmcn 0 Jun 25 17:46 score</div><div><br></div><div>I debugged into this and it appears that the cache parameter passed down to write-score() and write-fold() is incorrect. It looks like this when it is passed into write-score:</div>
<div><br></div><div><div>(WRITE-SCORE</div><div>"[tanaka-train] GP[0] +PT -LEX CW[] -AE NS[0] NT[] -NB LM[0] FT[:::1] RS[] MM[tao_lmvm] MI[5000] RT[1.0e-6] AT[1.0e-20] VA[1.0e+4] PC[100]"</div><div>((:SCORE . "0.0000421995000000") (:RANK . 1) (:RESULT-ID . 7)</div>
<div>(:PARSE-ID . 9000))</div><div>:CACHE ((:PROTOCOL . :RAW) (:COUNT . 0)</div><div>(:DATABASE</div><div>. "[tanaka-train] GP[0] +PT -LEX CW[] -AE NS[0] NT[] -NB LM[0] FT[:::1] RS[] MM[tao_lmvm] MI[5000] RT[1.0e-6] AT[1.0e-20] VA[1.0e+4] PC[100]")))</div>
<div><br></div></div>Note that there are no open file handles as part of the cache data structure. As a result the fold and score information is written to STDOUT instead of the output files.</div><div><br></div><div>The data structure shown above is the one that gets passed back from create-cache.</div>
<div><br></div><div>If I run with the single Tanaka profile instead of the virtual profile I do see open file handles for all the files in the ranker output profile.</div><div><br></div><div>Presumably there is something wrong with my setup of the Tanaka corpus, but I can't figure out what it is. The problematic virtual profile directory looks like this:</div>
<div><br></div><div><div>$ lrt tanaka-train/</div><div>total 289556</div><div>-rw-r--r-- 1 billmcn billmcn 146 Feb 17 17:17 virtual</div><div>-rw-r--r-- 1 billmcn billmcn 296203985 Jun 18 18:56 fc.mlm</div><div><br>
</div><div>I'm still debugging, but I thought I would ask in case there's something obvious I'm missing.</div></div><div><br></div><div>I don't have this problem with the English ranker experiments in the logon tree.<br>
-- <br>W.P. McNeill<br><a href="http://staff.washington.edu/billmcn/index.shtml">http://staff.washington.edu/billmcn/index.shtml</a><br>
</div>