[developers] Tanaka corpus logon parse ranker not creating output profile files

Sat Jun 27 09:39:23 CEST 2009

strictly speaking, there is nothing virtual about the skeleton: during  
the grid search, a new profile is created for each experiment. these  
profiles (with the funny-looking long names) need to contain all  
items, and they get created the ‘standard’ way, i.e. by  
instantiating a skeleton (which should contain files ‘item’ and  
‘relations’ and must be listed in the skeleton index).

best, oe

On Jun 27, 2009, at 2:33, Francis Bond <fcbond at gmail.com> wrote:

> G'day,
>
> 2009/6/27 W.P. McNeill (UW) <billmcn at u.washington.edu>:
>> I am running the logon parse ranking tools on the Japanese Tanaka  
>> corpus.  I
>> am able to successfully run a parse ranking experiment on individual
>> profiles in the corpus, but when I try to run the same experiment  
>> on a
>> virtual profile that includes the entire corpus it does not work  
>> because
>> files are not created in the output profile.  The contents of the  
>> profile
>> look like this when experiment is complete.
>> $ lrt \[tanaka-train\]\ GP\[0\]\ +PT\ -LEX\ CW\[\]\ -AE\ NS\[0\]\ NT 
>> \[\]\
>> -NB\ LM\[0\]\ FT\[\:\:\:1\]\ RS\[\]\ MM\[tao_lmvm\]\ MI\[5000\]\
>> RT\[1.0e-6\]\ AT\[1.0e-20\]\ VA\[1.0e+4\]\ PC\[100\]/
>> total 289556
>> -rw-r--r-- 1 billmcn billmcn       146 Jun 25 17:06 virtual
>> -rw-r--r-- 1 billmcn billmcn 296203985 Jun 25 17:07 fc.mlm
>> -rw-r--r-- 1 billmcn billmcn         0 Jun 25 17:07 fold
>> -rw-r--r-- 1 billmcn billmcn         0 Jun 25 17:46 score
>> I debugged into this and it appears that the cache parameter passed  
>> down to
>> write-score() and write-fold() is incorrect.  It looks like this  
>> when it is
>> passed into write-score:
>> (WRITE-SCORE
>> "[tanaka-train] GP[0] +PT -LEX CW[] -AE NS[0] NT[] -NB LM[0] FT[::: 
>> 1] RS[]
>> MM[tao_lmvm] MI[5000] RT[1.0e-6] AT[1.0e-20] VA[1.0e+4] PC[100]"
>> ((:SCORE . "0.0000421995000000") (:RANK . 1) (:RESULT-ID . 7)
>> (:PARSE-ID . 9000))
>> :CACHE ((:PROTOCOL . :RAW) (:COUNT . 0)
>> (:DATABASE
>> . "[tanaka-train] GP[0] +PT -LEX CW[] -AE NS[0] NT[] -NB LM[0] FT 
>> [:::1] RS[]
>> MM[tao_lmvm] MI[5000] RT[1.0e-6] AT[1.0e-20] VA[1.0e+4] PC[100]")))
>> Note that there are no open file handles as part of the cache data
>> structure.  As a result the fold and score information is written  
>> to STDOUT
>> instead of the output files.
>> The data structure shown above is the one that gets passed back from
>> create-cache.
>> If I run with the single Tanaka profile instead of the virtual  
>> profile I do
>> see open file handles for all the files in the ranker output profile.
>> Presumably there is something wrong with my setup of the Tanaka  
>> corpus, but
>> I can't figure out what it is.  The problematic virtual profile  
>> directory
>> looks like this:
>> $ lrt tanaka-train/
>> total 289556
>> -rw-r--r-- 1 billmcn billmcn       146 Feb 17 17:17 virtual
>> -rw-r--r-- 1 billmcn billmcn 296203985 Jun 18 18:56 fc.mlm
>> I'm still debugging, but I thought I would ask in case there's  
>> something
>> obvious I'm missing.
>> I don't have this problem with the English ranker experiments in  
>> the logon tree.
>
> I am not the expert here, but perhaps you should check that you have a
> virtual skeleton profile, as well as a virtual treebank.  My
> understanding is that you need both.
>
> Yours,
>
> -- 
> Francis Bond <bond at ieee.org>
> Division of Linguistics and Multilingual Studies
> Nanyang Technological University
>