[developers] training parse selection models using the fine system

Stephan Oepen oe at csli.Stanford.EDU
Fri Apr 21 11:29:52 CEST 2006


hi again, and apologies for the long turn-around time (easter week is a
very serious holiday in norway :-).

> I started with the fine system from
> http://lingo.stanford.edu/ftp/builds/2006-03-05/ , following the
> guide on the wiki. But when I click the Trees/Train, I got error
> saying:
> podium-loop(): Symbol TRAIN does not have a function definition.

the [incr tsdb()] MaxEnt (and SVM) experimentation support is in flux
still, and erik and i actively develop against the LOGON tree.  once in
a while, we merge back changes into the DELPH-IN tree, but if you were
to beta-test these days, you need to stick to the LOGON snapshots.  the
wiki information may be out-of-date too, for the same reasons (francis
is the primary author of [incr tsdb()] documentation :-).

> I also tried to follow the file `load' and `fc.lisp' in $HINOKI, using the 
> LOGON tree. The feature caching seems to work, so that I got big `fc.abt' 
> files afterwards. But when I try Trees/Train:
> 
> [11:07:26] operate-on-profiles(): reading `jan-06/jh0'
> operate-on-profiles(): caching `jan-06/jh0' [11 - 511|.
> [11:07:26] open-fc(): new BTree `fc.bdb'.
> podium-loop(): Attempt to call #("db_open" 11440496 0 2 11440496) for
>                 which the definition has not yet been (or is no longer)
>                 loaded.

as erik pointed out, there is code for either AllegroCache or BDB for
the low-level feature cache.  when compiling from source, the features
:acache and :bdb control which one gets used.  however, when using the
LOGON run-time binaries, only one of the two is compiled in, so i fail
to explain why you should get the error above (unless you had switched
to the DELPH-IN tree mid-stream)?  i had confirmed that i can run both
stages (`fc.lisp' and `skew.lisp') on `elf'.  could you try reproducing
just that?

> 2) can I only cache some basic features for a model used by pet? I tried 
> to set:
> (setf tsdb::*feature-grandparenting* 0)
> (setf tsdb::*feature-use-preterminal-types-p* nil)
> (setf tsdb::*feature-lexicalization-p* 0)
> (setf tsdb::*feature-active-edges-p* nil)
> (setf tsdb::*feature-ngram-size* 0)
> (setf tsdb::*feature-ngram-tag* :type)
> (setf tsdb::*feature-ngram-back-off-p* nil)
> but the fc.abt is still huge, and it still takes a long time.

given the setting above, the feature cache should be tiny.  at most a
few gigabytes for Hinoki, i should think.  here is what i see with all
features enabled:

  4.4G 2006-04-03 22:54 ./Def-6/f6-n-s1-1/jp051120.n/fc.bdb
  3.4G 2006-04-03 23:11 ./Def-6/f6-n-s1-2/jp051120.n/fc.bdb
  4.1G 2006-04-03 23:28 ./Def-6/f6-n-s1-3/jp051120.n/fc.bdb
  3.0G 2006-04-03 23:43 ./Def-6/f6-s-s1-1/jp051120.n/fc.bdb
  1.9G 2006-04-03 23:51 ./Def-6/f6-v-s1-1/jp051120.n/fc.bdb
  2.2G 2006-04-03 23:56 ./Def-6/f6-x-s1-1/jp051120.n/fc.bdb
  1.5G 2006-04-04 00:02 ./Def-s2/f6-n-s2-1/jp051120.n/fc.bdb
  1.8G 2006-04-04 00:08 ./Def-s2/f6-n-s2-2/jp051120.n/fc.bdb
  1.5G 2006-04-04 00:13 ./Def-s2/f6-n-s2-3/jp051120.n/fc.bdb
  1.5G 2006-04-04 00:17 ./Def-s2/f6-n-s2-4/jp051120.n/fc.bdb
  1.8G 2006-04-04 00:22 ./Def-s2/f6-n-s2-5/jp051120.n/fc.bdb
  1.2G 2006-04-04 00:26 ./Def-s2/f6-n-s2-6/jp051120.n/fc.bdb
  1.6G 2006-04-04 00:31 ./Def-s2/f6-n-s2-7/jp051120.n/fc.bdb
  2.1G 2006-04-04 00:37 ./Def-s2/f6-n-s2-8/jp051120.n/fc.bdb
  1.3G 2006-04-04 00:43 ./Def-s2/f6-n-s2-9/jp051120.n/fc.bdb
  328M 2006-04-04 00:44 ./Def-s2/f6-s-s2-1/jp051120.n/fc.bdb
  591M 2006-04-04 00:45 ./Def-s2/f6-s-s2-2/jp051120.n/fc.bdb
  447M 2006-04-04 00:47 ./Def-s2/f6-v-s2-1/jp051120.n/fc.bdb
  559M 2006-04-04 00:48 ./Def-s2/f6-v-s2-2/jp051120.n/fc.bdb
  414M 2006-04-04 00:49 ./Def-s2/f6-x-s2-1/jp051120.n/fc.bdb
  574M 2006-04-04 00:51 ./Def-s2/f6-x-s2-2/jp051120.n/fc.bdb

i believe the LOGON version you have uses AllegroCache by default (so
you got `.abt' files instead, but the size should be comparable.  with
grandparenting, n-grams, and lexicalization disabled, the files should
be a lot smaller.  --- this is probably obvious to you, but the feature
selection settings need to be active when the cache is built, i.e. you
would have to add the setf() forms above (which look good) to `load'.

more high-level, erik and i have now both migrated to using BDB, and i
am inclined to deprecate AllegroCache use.  i could probably produce a
new LOGON build with BDB on by default, and then you would have to run
`fc.lisp' and (your variant of) `skew.lisp' (or similar) again.  would
you like to try that?

                                                      all best  -  oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2285 7989
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at csli.stanford.edu; oe at ifi.uio.no; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



More information about the developers mailing list