[developers] [itsdb] Howto train a model on more than one profile

Fri Nov 10 12:06:24 CET 2006

hi berthold,

> it is really a large number of experiments. I started my first
> experiment about a week ago, and I am not even into grandparenting yet.
> Is there a way to speed things up, e.g. by dropping some less
> interesting variation in parameters? Or is there any support for
> multiprocessing?
> 
> Some parameters are not really self-explanatory. Can you provide some
> comments on grid.lisp? Which parameters are now supported in Pet? 

looking at your log file, all seems to proceed as it should :-).  most
of the time goes into parameter estimation, and there is little we can
do about that (short of parallelizing experiments, which i would like
to implement one day).  for each experiment, you get 18 grids:

 :variance '(nil 1e4 1e2 1e-2 1e-4 1e-6)
 :relative-tolerance '(1e-6 1e-8 1e-10))

one grid takes between five minutes and one hour (for your 15,000 Eiche
items), and by default each grid comprises two folds.  those hour-long
runs appear to be ones with either no prior (`variance') or a very low
relative tolerance; they often diverge.  maybe you could trim down the
TADM parameter variation, e.g.

 :variance '(1e4 1e2 1e-2 1e-4 1e-6)
 :relative-tolerance '(1e-6 1e-8))

assuming the default LOGON `grid.lisp, you should get 192 experiments

 :grandparenting '(0 2 3 4)
 :active-edges-p '(nil t)
 :lexicalization-p nil
 :constituent-weight '(1 2 0)
 :ngram-size '(0 2 3 4) :ngram-back-off-p '(nil t)

such that you are more than ten per cent done already :-).  so maybe my
defaults are overly generous with cpu days!  if your main interest is a 
model to use with PET, you can cut out all variation but grandparenting
and active edges (aka partial configurations).  so maybe the following

 :grandparenting '(0 2 3 4)
 :active-edges-p '(nil t)
 :lexicalization-p nil
 :constituent-weight 0
 :ngram-size 0 :ngram-back-off-p nil

this would bring down the total to eight experiments, each of ten grids
... you will be done in no time!

when talking to zhang yi recently, we (think we) worked out what would
be needed for PET to also support those n-gram features (with selective
unpacking that is; i personally believe it is not really worth adapting
the non-selective universe for additional features).  but before making
the time to implement such extensions, we should know how much we gain
on top of the basic configurational features plus grandparenting.  from
past experience, that could be relatively little.  to know for sure, we
would have to complete more of those experiments in the above ...  but
it might still make sense to narrow down estimation parameters first.

                                      i hope this helps!  cheers  -  oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at csli.stanford.edu; oe at ifi.uio.no; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++