[developers] [itsdb] Howto train a model on more than one profile
Berthold Crysmann
crysmann at dfki.de
Thu Nov 9 15:02:51 CET 2006
On Thu, 2006-10-19 at 17:53 +0200, Stephan Oepen wrote:
> hei!
>
> > I would like to train and evaluate our current German treebank. The
> > treebank is, however, distributed across different profiles. Is there
> > a way to select more than one profile as the source for parameter
> > estimation? Or do I have to combine the profiles into one large
> > profile?
>
> well, due to popular request, here is my secret about virtual profiles:
> as of late, it is possible to create `virtual profiles', which can then
> serve as the target profile for _some_ [incr tsdb()] operations.
>
> a virtual profile, like any other profile, is a directory somewhere in
> the [incr tsdb()] profile database `home' directory. the only file one
> needs to put into a virtual profile directory is one called `virtual'.
> the virtual file, in turn, contains the profile names of sub-profiles,
> e.g.
>
> "jh0"
> "jh1"
> "jh2"
> "jh3"
> "jh4"
> "jh5"
> "ps"
> "tg"
>
> here, `jh0' et al. must be valid profile names (visible in the podium),
> and the double quotes are mandatory.
>
> a few restrictions: virtual profiles are read-only and currently do not
> show in the [incr tsdb()] podium. yet, they can be useful in training
> and evaluating parse selection models.
>
> berthold, if you got the latest LOGON build (or CVS), that now includes
> a sub-directory `lingo/redwoods/', which provides the current versions
> of the LOGON ERG treebanks (called JHPSTG). also, there is a script by
> the name `load' (essentially setting up the environment for a variety
> of experimental tasks) and input files `fc.lisp' (creating the feature
> cache, a one-time operation); `grid.lisp' (executing a large number of
> experiments, with varying feature sets and estimation parameters);
Hi Stephan,
it is really a large number of experiments. I started my first
experiment about a week ago, and I am not even into grandparenting yet.
Is there a way to speed things up, e.g. by dropping some less
interesting variation in parameters? Or is there any support for
multiprocessing?
Some parameters are not really self-explanatory. Can you provide some
comments on grid.lisp? Which parameters are now supported in Pet?
Thanks,
Berthold
> and
> finally `train.lisp' (training and serializing a model, using a default
> set of parameters). you should be able to adapt all of this for your
> Eiche treebank data. note that, since virtual profiles are read-only,
> you will still need a skeleton for the full data set, as each iteration
> in `grid.lisp' needs to write scores et al. generally, i would suggest
> to always use the LOGON tree for parse selection experiments. it also
> includes suitable TADM (and SVM) binaries.
>
> emily and francis, i hope you might find this useful too :-).
>
> best - oe
>
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> +++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
> +++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
> +++ --- oe at csli.stanford.edu; oe at ifi.uio.no; stephan at oepen.net ---
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20061109/52257d7a/attachment.html>
More information about the developers
mailing list