[developers] training parse selection models using the fine system

Tue Apr 11 15:42:49 CEST 2006

Hi Yi.
It looks like the code you were using was a little confused whether to
use Berkley DB or AllegroCache for the feature caching (fc.bdb vs
fc.abt). Are you using the latest LOGON code out of CVS? If so,
whether BDB or AC is used depends on whether you have the :bdb or
:acache keyword added to *features* when loading the system (this can
be changed in $LOGONROOT/lingo/lkb/src/systems/tsdb.system).  By the
way, the caching routines seem to be faster when using BDB.

Now, in order to extract features and train a model using the
resulting feature cache, it should be sufficient to do

(train "your-gold-profile" "your-model" :fcp t)

Remember to first load the grammar if you're creating a feature cache,
or set :fcp to nil if you already have one. Setting the feature
parameters as you did below should give you a cache containing the
"basic" features only. To apply the resulting model to another profile
you can do something like this:

(tsdb :create "your-test-profile" :skeleton "your-test-profile-skeleton")
(operate-on-profiles
     (list "your-gold-profile")
      :model (read-model "your-model")
      :target "your-test-profile"
      :task :rank)

Again, this requires the grammar to be loaded. To only create a feature cache:

(operate-on-profiles
     (list "your-gold-profile")
     :task :fc)

If you want to train and test using n-fold cross-validation, you can
do the following (which would require that you have a feature cache
already):

(tsdb :create "your-test-profile" :skeleton "your-gold-profile-skeleton")
(rank-profile
           "your-gold-profile"
           "your-test-profile"
           :nfold n
           :recache t)

I hope this helps, but be sure to shout out if you get stuck. :)
Stephan might also have some more clarifying remarks, especially wrt
using the DELPHIN tree..?
Cheers,

-erik

On 4/11/06, Yi Zhang <yzhang at coli.uni-sb.de> wrote:
> Hi Stephan,
>
> Tim, Valia and I are still trying some experiments with the fine system
> and ERG. We have treebanked with different versions of the grammar and now
> I am trying to train the disambiguation models. I started with the fine
> system from http://lingo.stanford.edu/ftp/builds/2006-03-05/ , following
> the guide on the wiki. But when I click the Trees/Train, I got error
> saying:
> podium-loop(): Symbol TRAIN does not have a function definition.
>
>
> I also tried to follow the file `load' and `fc.lisp' in $HINOKI, using the
> LOGON tree. The feature caching seems to work, so that I got big `fc.abt'
> files afterwards. But when I try Trees/Train:
>
> [11:07:26] operate-on-profiles(): reading `jan-06/jh0'
> operate-on-profiles(): caching `jan-06/jh0' [11 - 511|.
> [11:07:26] open-fc(): new BTree `fc.bdb'.
> podium-loop(): Attempt to call #("db_open" 11440496 0 2 11440496) for
>                 which the definition has not yet been (or is no longer)
>                 loaded.
>
> So my questions go: 1) how do I train the parse selection model with
> either DELPHIN tree or LOGON tree?
> 2) can I only cache some basic features for a model used by pet? I tried
> to set:
> (setf tsdb::*feature-grandparenting* 0)
> (setf tsdb::*feature-use-preterminal-types-p* nil)
> (setf tsdb::*feature-lexicalization-p* 0)
> (setf tsdb::*feature-active-edges-p* nil)
> (setf tsdb::*feature-ngram-size* 0)
> (setf tsdb::*feature-ngram-tag* :type)
> (setf tsdb::*feature-ngram-back-off-p* nil)
> but the fc.abt is still huge, and it still takes a long time.
>
> Best,
> Yi
>