[developers] [incr() tsdb]/LKB memory allocation error

Ann Copestake aac10 at cl.cam.ac.uk
Thu Mar 17 09:33:46 CET 2016

So the process is running out of memory before hitting the limit on the 
number of chart edges, which stops processing a little more gracefully.  
The LKB batch parse process catches some errors in a way that allows the 
rest of the batch to continue.   It may be that all that's happening is 
that the chart edge limit was set too high relative to the available 
memory, although it is possible that memory is being used in a way that 
isn't reflected by the edge limit, which is why I suggested also looking 
at the token chart. You could increase the amount of memory available to 
the process and see whether you can get your test set through, but 
unless that's the final test set and you don't intend to work on any 
more complex examples than the ones you have, that's only going to be a 
temporary measure.

I don't think it will matter whether you look at examples that can 
eventually be parsed, something that fails after a huge number of edges 
or something that causes the memory crash - your task is to find out 
whether there is something you can do to cut down the number of rule 
applications.  The good news is that you won't need to find many cases 
of over-application to make a dramatic improvement. I think you will see 
the issues with the grammar when you look at a chart, even with a small 
edge limit.


On 17/03/2016 01:55, Olga Zamaraeva wrote:
> Thank you Ann!
> I suppose I should try to pin down an input that can be successfully 
> parsed, but does produce a huge chart. Of course my most pressing 
> problem is not that some inputs are parsed with huge charts but that 
> some inputs can never be parsed and break the system. But perhaps this 
> is caused by the same problem (or feature) in the grammar.
> The LKB does give an error message, the same memory allocation error 
> that comes through itsdb when that breaks (attached in the original 
> email).
> Olga
> On Tue, Mar 15, 2016 at 2:19 PM Ann Copestake <aac10 at cl.cam.ac.uk 
> <mailto:aac10 at cl.cam.ac.uk>> wrote:
>     I would say that you should attempt to debug in the LKB.  I don't know
>     exactly why [incr() tsdb] crashes while the LKB batch fails more
>     gracefully (does the LKB give an error message?) but you should
>     try and
>     understand what's going on to give you such a huge chart. That's
>     not to
>     say that it wouldn't be a good idea to know what the [incr() tsdb]
>     issue
>     is, but it probably won't help you much ...
>     If you're using the LKB's morphophonology, you might want to look
>     at the
>     token chart as well as the parse chart.  This is more recent than the
>     book, so isn't documented, but if you have an expanded menu, I
>     think it
>     shows up under Debug.  You want the `print token chart' item,
>     which will
>     output to the emacs window.  Similarly, if you're trying to debug
>     what's
>     going on and have an enormous parse chart, don't try and look at the
>     chart in a window, but use the `print chart' option.  You would
>     want to
>     reduce the maximum number of items to something a lot smaller than 20k
>     before you try that, though.
>     We should have a FAQ that says `ignore all the GC messages'.  It's
>     really just a symptom of the underlying system running out of space -
>     nothing to do with the LKB or [incr() tsdb] as such.  So there's not a
>     lot of enlightenment to be gained by understanding terms like
>     tenuring ...
>     Best,
>     Ann
>     On 15/03/2016 19:55, Olga Zamaraeva wrote:
>     > Dear developers!
>     >
>     > I am trying to use the LKB and [incr() tsdb] to parse a list of
>     verbs
>     > by a grammar of Chintang [ctn]. The language is polysynthetic, plus
>     > the grammar was created automatically using k-means clustering
>     for the
>     > morphology section, so some of the position classes have lots
>     and lots
>     > of inputs and lots and lots of lexical rule types and instances.
>     >
>     > I am running into a problem when  [incr() tsdb] crashes because of a
>     > memory allocation error. If I don't use itsdb and just go with LKB
>     > batch parsing, it is more robust as it can catch the error and
>     > continue parsing, having reported a failure on the problematic item,
>     > but the problem is still there and the parses still fail.
>     >
>     > I am a fairly inexperienced user of both systems, so right now I am
>     > trying to understand what is the best way for me to:
>     >
>     >  1) debug the grammar with respect to the problem, i.e. what is it
>     > about the grammar exactly that causes the issues;
>     > 2) do something with itsdb so that perhaps this does not happen?
>     Limit
>     > it somehow so that it doesn't try as much?
>     >
>     > Currently I am mostly just trying to filter out the problematic
>     > items... I also tried limiting the chart size to 30K, and that seems
>     > to have helped a little, but the crashes still happen on some items.
>     > If I limit the chart size to 20K, then it seems like maybe I can go
>     > through the test suite, but then my coverage suffers when I think it
>     > shouldn't: I think there are items which I can parse with 30K limit
>     > but not 20K... Is this the route I should be going in any case? Just
>     > optimizing for the chart size?.. Maybe 25K is my number :). The
>     chart
>     > is the parse chart, is that correct? I need to understand what
>     exactly
>     > makes the chart so huge in my case; how should I approach debugging
>     > that?..
>     >
>     > One specific question: what does "tenuring" mean with respect to
>     > garbage collection? Google doesn't know (nor does the manual, I
>     think).
>     >
>     > Does anyone have any comment on any of these issues? The (very
>     > helpful) chapter on errors and debugging in Copestake (2002) book
>     > mostly talks about other types of issues such as type loading
>     problems
>     > etc.. I also looked at what I found in ItsdbTop
>     > (http://moin.delph-in.net/ItsdbTop), and it does mention that on
>     > 32-bit systems memory problems are possible, but I think that
>     note has
>     > to do with treebanking, and it doesn't really tell me much about
>     what
>     > I should try in my case... I also looked thorough the itsdb manual
>     > (http://www.delph-in.net/itsdb/publications/manual.pdf) -- but it
>     > looks like some of the sections, specifically about debugging and
>     > options and parameters, are empty?
>     >
>     > Anyway, I would greatly appreciate any advice! I attach a
>     picture of a
>     > running testsuite processing, to give an idea about the memory usage
>     > and the chart size, and of the error. It is possible that the
>     grammar
>     > that I have is just not a usage scenario as far as itsdb is
>     concerned,
>     > but I don't yet have a clear understanding of whether that's the
>     case.
>     >
>     > Thanks!
>     > Olga

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20160317/3cd5cb58/attachment-0001.html>

More information about the developers mailing list