I have experienced empty fold and score files with the following error occurring inside the log file. batch-experiment(): error: `learner-rank-items(): 牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋 mysterious 牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋 score 牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋 deficit'. Best, Faisal <div class="gmail_quote">On Mon, Feb 27, 2009 at 19:08 PM, Bill McNeill (UW) <<a href="mailto:developers%40delph-in.net?Subject=%5Bdevelopers%5D%20What%20errors%20can%20cause%20a%20grid%20parse%20reranking%0A%09process%20to%20return%20an%20empty%20scores%20file%3F&In-Reply-To=200902271657.n1RGvDrs025375%40mv.emmtee.net" title="[developers] What errors can cause a grid parse reranking process to return an empty scores file?">billmcn at u.washington.edu</a>> wrote: <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><pre> Any error messages in particular I should look out for in the log files. (I didn't see anything obvious like "Out of memory!") On Fri, Feb 27, 2009 at 8:57 AM, Stephan Oepen <<a href="http://lists.delph-in.net/mailman/listinfo/developers" target="_blank">oe at ifi.uio.no</a>> wrote: > hi again, bill, > > > I am running many different grid* files in parallel using the Condor > > distributed computing system. On my latest round of jobs, all of my > > Condor jobs completed, but when I look at the directories created > > under ~/logon/lingo/redwoods/tsdb/home, I find that most of them have > > empty score files. I take the empty score files to be a sign that > > something didn't work. > > yes, i would say they indicate a failed experiment. there are quite a > few ways in which individual experiments can fail, without the complete > job necessarily failing. Lisp may run out of memory at some point, but > `recover' from that and carry on; for example, when reading the profile > data to start an experiment, a fresh Lisp process will almost certainly > need to grow substantially. with limited RAM and swap space, that may > fail, but an `out of memory' error may be caught by the caller, where i > can say for sure that [incr tsdb()] frequently catches errors, but i am > less confident (off the top of my head) about how these will be handled > in the context of feature caching and ME grid searches. our `approach' > has typically been lazy: avoid errors of this kind, hence it is quite > likely that they are not handled in a very meaningful way. experiments > might end up being skipped, or even executed with incomplete data ... > > in a similar spirit, the parameter searches call tadm and evaluate many > times, and either one could crash (insufficient memory or disk space in > `/tmp'), and again i cannot really say how that would be handled. i am > afraid, my best recommendation is to (a) inspect the log files created > by the `load' script and (b) try to create an environment for such jobs > where you are pretty confident you have some remaining headroom. from > my experience, i would think that means a minimum of 16 gbytes in RAM, > generous swap space (on top of RAM), and at least several gigabytes of > disk space in `/tmp'. > > as regards your earlier (related) question about resource usage: > > > What was your memory high water mark during test (as opposed to > > training)? > > memory consumption will depend on two parameters: the total number of > results (i.e. distinct trees: `zcat result.gz | wc -l'), and how many > feature templates are active (e.g. levels of grandparenting, n-grams, > active edges, constituent weight). i have started to run experiments > again myself, and i notice that we have become sloppy with memory use > (the process holds on to data longer than it should need to; and the > specifics of Lisp-internal memory management may be sub-optimal too). > i am currently making changes liberally to the LOGON `trunk', where i > would suggest you stick to the HandOn release version until everything > has stabilized again (hopefully sometime next week, or so). > > all best - oe </pre> </blockquote></div>