[developers] Do I need to rebuild the feature cache in logon if I work with a new virtual corpus?

Mon Mar 23 23:39:17 CET 2009

I have experienced empty fold and score files with the following error
occurring inside the log file.

batch-experiment(): error: `learner-rank-items():
                                       mysterious
                                       score
                                       deficit'.

Best,
Faisal

On Mon, Feb 27, 2009 at 19:08 PM, Bill McNeill (UW) <billmcn at
u.washington.edu<developers%40delph-in.net?Subject=%5Bdevelopers%5D%20What%20errors%20can%20cause%20a%20grid%20parse%20reranking%0A%09process%20to%20return%20an%20empty%20scores%20file%3F&In-Reply-To=200902271657.n1RGvDrs025375%40mv.emmtee.net>>
wrote:

>
> Any error messages in particular I should look out for in the log files.  (I
> didn't see anything obvious like "Out of memory!")
>
> On Fri, Feb 27, 2009 at 8:57 AM, Stephan Oepen <oe at ifi.uio.no <http://lists.delph-in.net/mailman/listinfo/developers>> wrote:
>
> >* hi again, bill,
> *>*
> *>* > I am running many different grid* files in parallel using the Condor
> *>* > distributed computing system.  On my latest round of jobs, all of my
> *>* > Condor jobs completed, but when I look at the directories created
> *>* > under ~/logon/lingo/redwoods/tsdb/home, I find that most of them have
> *>* > empty score files.  I take the empty score files to be a sign that
> *>* > something didn't work.
> *>*
> *>* yes, i would say they indicate a failed experiment.  there are quite a
> *>* few ways in which individual experiments can fail, without the complete
> *>* job necessarily failing.  Lisp may run out of memory at some point, but
> *>* `recover' from that and carry on; for example, when reading the profile
> *>* data to start an experiment, a fresh Lisp process will almost certainly
> *>* need to grow substantially.  with limited RAM and swap space, that may
> *>* fail, but an `out of memory' error may be caught by the caller, where i
> *>* can say for sure that [incr tsdb()] frequently catches errors, but i am
> *>* less confident (off the top of my head) about how these will be handled
> *>* in the context of feature caching and ME grid searches.  our `approach'
> *>* has typically been lazy: avoid errors of this kind, hence it is quite
> *>* likely that they are not handled in a very meaningful way.  experiments
> *>* might end up being skipped, or even executed with incomplete data ...
> *>*
> *>* in a similar spirit, the parameter searches call tadm and evaluate many
> *>* times, and either one could crash (insufficient memory or disk space in
> *>* `/tmp'), and again i cannot really say how that would be handled.  i am
> *>* afraid, my best recommendation is to (a) inspect the log files created
> *>* by the `load' script and (b) try to create an environment for such jobs
> *>* where you are pretty confident you have some remaining headroom.  from
> *>* my experience, i would think that means a minimum of 16 gbytes in RAM,
> *>* generous swap space (on top of RAM), and at least several gigabytes of
> *>* disk space in `/tmp'.
> *>*
> *>* as regards your earlier (related) question about resource usage:
> *>*
> *>* >  What was your memory high water mark during test (as opposed to
> *>* >  training)?
> *>*
> *>* memory consumption will depend on two parameters: the total number of
> *>* results (i.e. distinct trees: `zcat result.gz | wc -l'), and how many
> *>* feature templates are active (e.g. levels of grandparenting, n-grams,
> *>* active edges, constituent weight).  i have started to run experiments
> *>* again myself, and i notice that we have become sloppy with memory use
> *>* (the process holds on to data longer than it should need to; and the
> *>* specifics of Lisp-internal memory management may be sub-optimal too).
> *>* i am currently making changes liberally to the LOGON `trunk', where i
> *>* would suggest you stick to the HandOn release version until everything
> *>* has stabilized again (hopefully sometime next week, or so).
> *>*
> *>*                                                      all best  -  oe
> *
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20090323/732978b9/attachment.html>