Any error messages in particular I should look out for in the log files. (I didn't see anything obvious like "Out of memory!") <div class="gmail_quote">On Fri, Feb 27, 2009 at 8:57 AM, Stephan Oepen <<a href="mailto:oe@ifi.uio.no">oe@ifi.uio.no</a>> wrote: <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">hi again, bill, <div class="Ih2E3d"> > I am running many different grid* files in parallel using the Condor > distributed computing system. On my latest round of jobs, all of my > Condor jobs completed, but when I look at the directories created > under ~/logon/lingo/redwoods/tsdb/home, I find that most of them have > empty score files. I take the empty score files to be a sign that > something didn't work. </div>yes, i would say they indicate a failed experiment. there are quite a few ways in which individual experiments can fail, without the complete job necessarily failing. Lisp may run out of memory at some point, but `recover' from that and carry on; for example, when reading the profile data to start an experiment, a fresh Lisp process will almost certainly need to grow substantially. with limited RAM and swap space, that may fail, but an `out of memory' error may be caught by the caller, where i can say for sure that [incr tsdb()] frequently catches errors, but i am less confident (off the top of my head) about how these will be handled in the context of feature caching and ME grid searches. our `approach' has typically been lazy: avoid errors of this kind, hence it is quite likely that they are not handled in a very meaningful way. experiments might end up being skipped, or even executed with incomplete data ... in a similar spirit, the parameter searches call tadm and evaluate many times, and either one could crash (insufficient memory or disk space in `/tmp'), and again i cannot really say how that would be handled. i am afraid, my best recommendation is to (a) inspect the log files created by the `load' script and (b) try to create an environment for such jobs where you are pretty confident you have some remaining headroom. from my experience, i would think that means a minimum of 16 gbytes in RAM, generous swap space (on top of RAM), and at least several gigabytes of disk space in `/tmp'. as regards your earlier (related) question about resource usage: > What was your memory high water mark during test (as opposed to > training)? memory consumption will depend on two parameters: the total number of results (i.e. distinct trees: `zcat result.gz | wc -l'), and how many feature templates are active (e.g. levels of grandparenting, n-grams, active edges, constituent weight). i have started to run experiments again myself, and i notice that we have become sloppy with memory use (the process holds on to data longer than it should need to; and the specifics of Lisp-internal memory management may be sub-optimal too). i am currently making changes liberally to the LOGON `trunk', where i would suggest you stick to the HandOn release version until everything has stabilized again (hopefully sometime next week, or so). all best - oe +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125 +++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515 +++ --- <a href="mailto:oe@ifi.uio.no">oe@ifi.uio.no</a>; <a href="mailto:oe@csli.stanford.edu">oe@csli.stanford.edu</a>; <a href="mailto:stephan@oepen.net">stephan@oepen.net</a> --- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ </blockquote></div> -- Bill McNeill <a href="http://staff.washington.edu/billmcn/index.shtml">http://staff.washington.edu/billmcn/index.shtml</a> Sent from: Seattle Washington United States.