[developers] Do I need to rebuild the feature cache in logon if I work with a new virtual corpus?
Stephan Oepen
oe at ifi.uio.no
Tue Mar 3 09:45:56 CET 2009
> I've been logon parse ranking experiments with the jhpstg virtual profile.
> Now I want to run experiments with a smaller corpus, so I'm going to create
> a virtual profile with only some of the contents of jhpstg. Do I need to
> rerun the feature cachinng step (load --binary fc.lisp)?
no, at least in principle. the feature cache comprises the BDB files
in each of the profiles (`fc.bdb') and the symbol table in the master
profile (`fc.mlm', in the virtual profile). i have at times used the
same feature cache with multiple virtual profile configuration, where
of course the virtual profile used for creating the cache needs to be
a super-set of all such configurations (such that all features are in
the cache). thus, when creating a sub-set virtual profile, it should
work to copy the original virtual profile (including its `fc.mlm') and
then trim down the entries in the `virtual' file.
> Do I need to rerun the training step (load --binary train.lisp)?
yes, certainly. training reflects one specific feature configuration
and choice of hyper parameters. though, i wonder, why do you train in
the first place. the grid search is the most effective procedure for
training and evaluating a series of configurations. `train.lisp', on
the other hand, is there to serialize a model (e.g. `jhpstg.mem') for
use with PET or the LKB---once the grid search has identified the best
performer.
all best - oe
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++ --- oe at ifi.uio.no; oe at csli.stanford.edu; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
More information about the developers
mailing list