[developers] What errors can cause a grid parse reranking process to return an empty scores file?
Bill McNeill (UW)
billmcn at u.washington.edu
Fri Feb 27 19:08:39 CET 2009
Any error messages in particular I should look out for in the log files. (I
didn't see anything obvious like "Out of memory!")
On Fri, Feb 27, 2009 at 8:57 AM, Stephan Oepen <oe at ifi.uio.no> wrote:
> hi again, bill,
>
> > I am running many different grid* files in parallel using the Condor
> > distributed computing system. On my latest round of jobs, all of my
> > Condor jobs completed, but when I look at the directories created
> > under ~/logon/lingo/redwoods/tsdb/home, I find that most of them have
> > empty score files. I take the empty score files to be a sign that
> > something didn't work.
>
> yes, i would say they indicate a failed experiment. there are quite a
> few ways in which individual experiments can fail, without the complete
> job necessarily failing. Lisp may run out of memory at some point, but
> `recover' from that and carry on; for example, when reading the profile
> data to start an experiment, a fresh Lisp process will almost certainly
> need to grow substantially. with limited RAM and swap space, that may
> fail, but an `out of memory' error may be caught by the caller, where i
> can say for sure that [incr tsdb()] frequently catches errors, but i am
> less confident (off the top of my head) about how these will be handled
> in the context of feature caching and ME grid searches. our `approach'
> has typically been lazy: avoid errors of this kind, hence it is quite
> likely that they are not handled in a very meaningful way. experiments
> might end up being skipped, or even executed with incomplete data ...
>
> in a similar spirit, the parameter searches call tadm and evaluate many
> times, and either one could crash (insufficient memory or disk space in
> `/tmp'), and again i cannot really say how that would be handled. i am
> afraid, my best recommendation is to (a) inspect the log files created
> by the `load' script and (b) try to create an environment for such jobs
> where you are pretty confident you have some remaining headroom. from
> my experience, i would think that means a minimum of 16 gbytes in RAM,
> generous swap space (on top of RAM), and at least several gigabytes of
> disk space in `/tmp'.
>
> as regards your earlier (related) question about resource usage:
>
> > What was your memory high water mark during test (as opposed to
> > training)?
>
> memory consumption will depend on two parameters: the total number of
> results (i.e. distinct trees: `zcat result.gz | wc -l'), and how many
> feature templates are active (e.g. levels of grandparenting, n-grams,
> active edges, constituent weight). i have started to run experiments
> again myself, and i notice that we have become sloppy with memory use
> (the process holds on to data longer than it should need to; and the
> specifics of Lisp-internal memory management may be sub-optimal too).
> i am currently making changes liberally to the LOGON `trunk', where i
> would suggest you stick to the HandOn release version until everything
> has stabilized again (hopefully sometime next week, or so).
>
> all best - oe
>
>
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> +++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284
> 0125
> +++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
> +++ --- oe at ifi.uio.no; oe at csli.stanford.edu; stephan at oepen.net ---
>
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
--
Bill McNeill
http://staff.washington.edu/billmcn/index.shtml
Sent from: Seattle Washington United States.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20090227/57e923c2/attachment.html>
More information about the developers
mailing list