[developers] Need help with the feature caching step of the parse ranking code in the logon tree

Stephan Oepen oe at ifi.uio.no
Tue Jan 6 00:35:16 CET 2009


hi bill, my apologies for the late reply!

> Error: open-fc(): error -1 for
>   `/home/billmcn/logon/lingo/redwoods/tsdb/home/jh3/fc.bdb'.

  [...]

> I tried running it again and saw a similar error, except this time in
> jh4/fc.bdb.

i am afraid this appears to be a local filesystem-related error.  i get
the same behavior when running on `patas' at UW.  but running the exact
same code on another two systems (of which one mounts the directory for
the BDB files using NFS, the other using GPFS), my feature cache builds
successfully.

as a short-term solution, you might work around this problem by using
smaller data sets, but mid-term you will need to work with your system
administration to look into NFS mount(8) options and interactions with
BDB, or maybe see whether you can use a local, non-NFS directory.

>    2. Can I just rerun load --binary fc.lisp, or should I do some
>    cleanup first?

yes, just re-running will delete old `.bdb' files and re-create them.

>    3. How do I run database recovery as directed by the error message
>    in STDERR?

i am afraid, i have no idea.  this is a BDB-internal message, and since
there is no way of resuming a partial feature cache operation anyway, i
think there would be no value in recovering that specific file.

>    4. Is there some way I can specify a very small treebank for the
>    system to work with?  Something that could finish this step in a
>    few minutes rather than a few hours.  It would be okay if it was
>    too small to get reliable statistics out of, since for starters
>    I'm just verifying if the setup works.

yes, this one is easy.  all the `load' script actually does is pipe a
sequence of Lisp commands into the stdin of the LOGON run-time binary.
you can see the actual commands by running it as

  ./load --cat --binary fc.lisp

the profiles that get used (or created in the later cross-validation
steps) are specified in the `.lisp' files given as input to `load'.
for example, you can replace `jhpstg' in `fc.lisp' (and `grid.lisp')
with `jh1' (or `jh0', if you want a really small data set).

>    5. Is there some way I can manually run the individual jh1, jh2, jh3,
>    etc. steps?  (It seems like they shouldn't depend on each other.)

no, i am afraid not.  they do depend on each other, as during caching
a single symbol table (mapping from symbolic feature representations to
integers) is created, so creating the feature cache incrementally would
require serializing intermediate, partial symbol tables, plus restoring
the previous state for each incremental re-start.  all of course doable
in principle, but not supported in the code currently ...

the relevant [incr tsdb()] parts are `redwoods.lisp', `features.lisp',
and `learner.lisp' (all in `lingo/lkb/src/tsdb/lisp/').  the scripts in
`lingo/redwoods/' are merely for convenience; if you plan on extending
experiments with additional features, you will need to navigate larger
parts of the above files and augment the code.  once you start thinking
more about that, please email again, and i can try to give pointers to
specific functions that will need adaptation ...

                                                     all best  -  oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at ifi.uio.no; oe at csli.stanford.edu; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



More information about the developers mailing list