[developers] Do I need to rebuild the feature cache in logon if I work with a new virtual corpus?

Tue Mar 3 09:45:56 CET 2009

> I've been logon parse ranking experiments with the jhpstg virtual profile.
> Now I want to run experiments with a smaller corpus, so I'm going to create
> a virtual profile with only some of the contents of jhpstg.  Do I need to
> rerun the feature cachinng step (load --binary fc.lisp)?  

no, at least in principle.  the feature cache comprises the BDB files
in each of the profiles (`fc.bdb') and the symbol table in the master
profile (`fc.mlm', in the virtual profile).  i have at times used the
same feature cache with multiple virtual profile configuration, where
of course the virtual profile used for creating the cache needs to be
a super-set of all such configurations (such that all features are in
the cache).  thus, when creating a sub-set virtual profile, it should
work to copy the original virtual profile (including its `fc.mlm') and
then trim down the entries in the `virtual' file.

> Do I need to rerun the training step (load --binary train.lisp)?

yes, certainly.  training reflects one specific feature configuration
and choice of hyper parameters.  though, i wonder, why do you train in
the first place.  the grid search is the most effective procedure for
training and evaluating a series of configurations.  `train.lisp', on
the other hand, is there to serialize a model (e.g. `jhpstg.mem') for
use with PET or the LKB---once the grid search has identified the best
performer.

                                                     all best  -  oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at ifi.uio.no; oe at csli.stanford.edu; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++