[developers] Tanaka corpus logon parse ranker not creating output profile files
Stephan Oepen
stephan.oepen at gmail.com
Sat Jun 27 09:39:23 CEST 2009
strictly speaking, there is nothing virtual about the skeleton: during
the grid search, a new profile is created for each experiment. these
profiles (with the funny-looking long names) need to contain all
items, and they get created the ‘standard’ way, i.e. by
instantiating a skeleton (which should contain files ‘item’ and
‘relations’ and must be listed in the skeleton index).
best, oe
On Jun 27, 2009, at 2:33, Francis Bond <fcbond at gmail.com> wrote:
> G'day,
>
> 2009/6/27 W.P. McNeill (UW) <billmcn at u.washington.edu>:
>> I am running the logon parse ranking tools on the Japanese Tanaka
>> corpus. I
>> am able to successfully run a parse ranking experiment on individual
>> profiles in the corpus, but when I try to run the same experiment
>> on a
>> virtual profile that includes the entire corpus it does not work
>> because
>> files are not created in the output profile. The contents of the
>> profile
>> look like this when experiment is complete.
>> $ lrt \[tanaka-train\]\ GP\[0\]\ +PT\ -LEX\ CW\[\]\ -AE\ NS\[0\]\ NT
>> \[\]\
>> -NB\ LM\[0\]\ FT\[\:\:\:1\]\ RS\[\]\ MM\[tao_lmvm\]\ MI\[5000\]\
>> RT\[1.0e-6\]\ AT\[1.0e-20\]\ VA\[1.0e+4\]\ PC\[100\]/
>> total 289556
>> -rw-r--r-- 1 billmcn billmcn 146 Jun 25 17:06 virtual
>> -rw-r--r-- 1 billmcn billmcn 296203985 Jun 25 17:07 fc.mlm
>> -rw-r--r-- 1 billmcn billmcn 0 Jun 25 17:07 fold
>> -rw-r--r-- 1 billmcn billmcn 0 Jun 25 17:46 score
>> I debugged into this and it appears that the cache parameter passed
>> down to
>> write-score() and write-fold() is incorrect. It looks like this
>> when it is
>> passed into write-score:
>> (WRITE-SCORE
>> "[tanaka-train] GP[0] +PT -LEX CW[] -AE NS[0] NT[] -NB LM[0] FT[:::
>> 1] RS[]
>> MM[tao_lmvm] MI[5000] RT[1.0e-6] AT[1.0e-20] VA[1.0e+4] PC[100]"
>> ((:SCORE . "0.0000421995000000") (:RANK . 1) (:RESULT-ID . 7)
>> (:PARSE-ID . 9000))
>> :CACHE ((:PROTOCOL . :RAW) (:COUNT . 0)
>> (:DATABASE
>> . "[tanaka-train] GP[0] +PT -LEX CW[] -AE NS[0] NT[] -NB LM[0] FT
>> [:::1] RS[]
>> MM[tao_lmvm] MI[5000] RT[1.0e-6] AT[1.0e-20] VA[1.0e+4] PC[100]")))
>> Note that there are no open file handles as part of the cache data
>> structure. As a result the fold and score information is written
>> to STDOUT
>> instead of the output files.
>> The data structure shown above is the one that gets passed back from
>> create-cache.
>> If I run with the single Tanaka profile instead of the virtual
>> profile I do
>> see open file handles for all the files in the ranker output profile.
>> Presumably there is something wrong with my setup of the Tanaka
>> corpus, but
>> I can't figure out what it is. The problematic virtual profile
>> directory
>> looks like this:
>> $ lrt tanaka-train/
>> total 289556
>> -rw-r--r-- 1 billmcn billmcn 146 Feb 17 17:17 virtual
>> -rw-r--r-- 1 billmcn billmcn 296203985 Jun 18 18:56 fc.mlm
>> I'm still debugging, but I thought I would ask in case there's
>> something
>> obvious I'm missing.
>> I don't have this problem with the English ranker experiments in
>> the logon tree.
>
> I am not the expert here, but perhaps you should check that you have a
> virtual skeleton profile, as well as a virtual treebank. My
> understanding is that you need both.
>
> Yours,
>
> --
> Francis Bond <bond at ieee.org>
> Division of Linguistics and Multilingual Studies
> Nanyang Technological University
>
More information about the developers
mailing list