<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/3.10.3">
</HEAD>
<BODY>
On Thu, 2006-10-19 at 17:53 +0200, Stephan Oepen wrote:
<BLOCKQUOTE TYPE=CITE>
<PRE>
<FONT COLOR="#000000">hei!</FONT>
<FONT COLOR="#000000">> I would like to train and evaluate our current German treebank. The</FONT>
<FONT COLOR="#000000">> treebank is, however, distributed across different profiles. Is there</FONT>
<FONT COLOR="#000000">> a way to select more than one profile as the source for parameter</FONT>
<FONT COLOR="#000000">> estimation? Or do I have to combine the profiles into one large</FONT>
<FONT COLOR="#000000">> profile?</FONT>
<FONT COLOR="#000000">well, due to popular request, here is my secret about virtual profiles:</FONT>
<FONT COLOR="#000000">as of late, it is possible to create `virtual profiles', which can then</FONT>
<FONT COLOR="#000000">serve as the target profile for _some_ [incr tsdb()] operations.</FONT>
<FONT COLOR="#000000">a virtual profile, like any other profile, is a directory somewhere in</FONT>
<FONT COLOR="#000000">the [incr tsdb()] profile database `home' directory. the only file one</FONT>
<FONT COLOR="#000000">needs to put into a virtual profile directory is one called `virtual'.</FONT>
<FONT COLOR="#000000">the virtual file, in turn, contains the profile names of sub-profiles,</FONT>
<FONT COLOR="#000000">e.g.</FONT>
<FONT COLOR="#000000"> "jh0"</FONT>
<FONT COLOR="#000000"> "jh1"</FONT>
<FONT COLOR="#000000"> "jh2"</FONT>
<FONT COLOR="#000000"> "jh3"</FONT>
<FONT COLOR="#000000"> "jh4"</FONT>
<FONT COLOR="#000000"> "jh5"</FONT>
<FONT COLOR="#000000"> "ps"</FONT>
<FONT COLOR="#000000"> "tg"</FONT>
<FONT COLOR="#000000">here, `jh0' et al. must be valid profile names (visible in the podium),</FONT>
<FONT COLOR="#000000">and the double quotes are mandatory.</FONT>
<FONT COLOR="#000000">a few restrictions: virtual profiles are read-only and currently do not</FONT>
<FONT COLOR="#000000">show in the [incr tsdb()] podium. yet, they can be useful in training</FONT>
<FONT COLOR="#000000">and evaluating parse selection models.</FONT>
<FONT COLOR="#000000">berthold, if you got the latest LOGON build (or CVS), that now includes</FONT>
<FONT COLOR="#000000">a sub-directory `lingo/redwoods/', which provides the current versions</FONT>
<FONT COLOR="#000000">of the LOGON ERG treebanks (called JHPSTG). also, there is a script by</FONT>
<FONT COLOR="#000000">the name `load' (essentially setting up the environment for a variety</FONT>
<FONT COLOR="#000000">of experimental tasks) and input files `fc.lisp' (creating the feature</FONT>
<FONT COLOR="#000000">cache, a one-time operation); `grid.lisp' (executing a large number of</FONT>
<FONT COLOR="#000000">experiments, with varying feature sets and estimation parameters); </FONT>
</PRE>
</BLOCKQUOTE>
<BR>
Hi Stephan, <BR>
<BR>
it is really a large number of experiments. I started my first experiment about a week ago, and I am not even into grandparenting yet. Is there a way to speed things up, e.g. by dropping some less interesting variation in parameters? Or is there any support for multiprocessing?<BR>
<BR>
Some parameters are not really self-explanatory. Can you provide some comments on grid.lisp? Which parameters are now supported in Pet? <BR>
<BR>
Thanks, <BR>
<BR>
Berthold
<BLOCKQUOTE TYPE=CITE>
<PRE>
<FONT COLOR="#000000">and</FONT>
<FONT COLOR="#000000">finally `train.lisp' (training and serializing a model, using a default</FONT>
<FONT COLOR="#000000">set of parameters). you should be able to adapt all of this for your</FONT>
<FONT COLOR="#000000">Eiche treebank data. note that, since virtual profiles are read-only,</FONT>
<FONT COLOR="#000000">you will still need a skeleton for the full data set, as each iteration</FONT>
<FONT COLOR="#000000">in `grid.lisp' needs to write scores et al. generally, i would suggest</FONT>
<FONT COLOR="#000000">to always use the LOGON tree for parse selection experiments. it also</FONT>
<FONT COLOR="#000000">includes suitable TADM (and SVM) binaries.</FONT>
<FONT COLOR="#000000">emily and francis, i hope you might find this useful too :-).</FONT>
<FONT COLOR="#000000"> best - oe</FONT>
<FONT COLOR="#000000">+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++</FONT>
<FONT COLOR="#000000">+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125</FONT>
<FONT COLOR="#000000">+++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515</FONT>
<FONT COLOR="#000000">+++ --- <A HREF="mailto:oe@csli.stanford.edu">oe@csli.stanford.edu</A>; <A HREF="mailto:oe@ifi.uio.no">oe@ifi.uio.no</A>; <A HREF="mailto:stephan@oepen.net">stephan@oepen.net</A> ---</FONT>
<FONT COLOR="#000000">+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++</FONT>
</PRE>
</BLOCKQUOTE>
</BODY>
</HTML>