[developers] [incr() tsdb]/LKB memory allocation error

Ann Copestake aac10 at cl.cam.ac.uk
Sat Apr 9 16:31:43 CEST 2016

Dear Olga,

Sorry for not replying - did you manage to find a work around?


On 24/03/2016 21:55, Olga Zamaraeva wrote:
> Looking at the token chart, I see that I in fact have many lexical 
> rules for the same orthography, and that results in too many parses (a 
> snippet of the output is attached). For example, for this "a-tis-e" 
> input that I am trying, there are dozens lexical rules for a- and 
> dozens for -a, and so the combinations are too many. The reason for 
> this is because the rules were inferred automatically by a clustering 
> algorithm, and I asked for many clusters (there is a reason for asking 
> for many clusters also: I am trying to compare these results with 
> another algorithm which happened to infer many position classes, so I 
> want my clustering to come up with the same number and then to compare 
> two grammars).
> Is there a way to have the LKB stop after it found a parse, and not 
> try other possibilities? I tried doing that in itsdb (by turning off 
> exhaustive search and limiting the maximum number of analyses), but it 
> still cannot handle this large grammar for some reason...
> Thank you!
> Olga
> On Thu, Mar 17, 2016 at 1:34 AM Ann Copestake <aac10 at cl.cam.ac.uk 
> <mailto:aac10 at cl.cam.ac.uk>> wrote:
>     So the process is running out of memory before hitting the limit
>     on the number of chart edges, which stops processing a little more
>     gracefully.  The LKB batch parse process catches some errors in a
>     way that allows the rest of the batch to continue.   It may be
>     that all that's happening is that the chart edge limit was set too
>     high relative to the available memory, although it is possible
>     that memory is being used in a way that isn't reflected by the
>     edge limit, which is why I suggested also looking at the token
>     chart.   You could increase the amount of memory available to the
>     process and see whether you can get your test set through, but
>     unless that's the final test set and you don't intend to work on
>     any more complex examples than the ones you have, that's only
>     going to be a temporary measure.
>     I don't think it will matter whether you look at examples that can
>     eventually be parsed, something that fails after a huge number of
>     edges or something that causes the memory crash - your task is to
>     find out whether there is something you can do to cut down the
>     number of rule applications.  The good news is that you won't need
>     to find many cases of over-application to make a dramatic
>     improvement. I think you will see the issues with the grammar when
>     you look at a chart, even with a small edge limit.
>     Ann
>     On 17/03/2016 01:55, Olga Zamaraeva wrote:
>>     Thank you Ann!
>>     I suppose I should try to pin down an input that can be
>>     successfully parsed, but does produce a huge chart. Of course my
>>     most pressing problem is not that some inputs are parsed with
>>     huge charts but that some inputs can never be parsed and break
>>     the system. But perhaps this is caused by the same problem (or
>>     feature) in the grammar.
>>     The LKB does give an error message, the same memory allocation
>>     error that comes through itsdb when that breaks (attached in the
>>     original email).
>>     Olga
>>     On Tue, Mar 15, 2016 at 2:19 PM Ann Copestake <aac10 at cl.cam.ac.uk
>>     <mailto:aac10 at cl.cam.ac.uk>> wrote:
>>         I would say that you should attempt to debug in the LKB.  I
>>         don't know
>>         exactly why [incr() tsdb] crashes while the LKB batch fails more
>>         gracefully (does the LKB give an error message?) but you
>>         should try and
>>         understand what's going on to give you such a huge chart.
>>         That's not to
>>         say that it wouldn't be a good idea to know what the [incr()
>>         tsdb] issue
>>         is, but it probably won't help you much ...
>>         If you're using the LKB's morphophonology, you might want to
>>         look at the
>>         token chart as well as the parse chart.  This is more recent
>>         than the
>>         book, so isn't documented, but if you have an expanded menu,
>>         I think it
>>         shows up under Debug.  You want the `print token chart' item,
>>         which will
>>         output to the emacs window.  Similarly, if you're trying to
>>         debug what's
>>         going on and have an enormous parse chart, don't try and look
>>         at the
>>         chart in a window, but use the `print chart' option.  You
>>         would want to
>>         reduce the maximum number of items to something a lot smaller
>>         than 20k
>>         before you try that, though.
>>         We should have a FAQ that says `ignore all the GC messages'. 
>>         It's
>>         really just a symptom of the underlying system running out of
>>         space -
>>         nothing to do with the LKB or [incr() tsdb] as such.  So
>>         there's not a
>>         lot of enlightenment to be gained by understanding terms like
>>         tenuring ...
>>         Best,
>>         Ann
>>         On 15/03/2016 19:55, Olga Zamaraeva wrote:
>>         > Dear developers!
>>         >
>>         > I am trying to use the LKB and [incr() tsdb] to parse a
>>         list of verbs
>>         > by a grammar of Chintang [ctn]. The language is
>>         polysynthetic, plus
>>         > the grammar was created automatically using k-means
>>         clustering for the
>>         > morphology section, so some of the position classes have
>>         lots and lots
>>         > of inputs and lots and lots of lexical rule types and
>>         instances.
>>         >
>>         > I am running into a problem when  [incr() tsdb] crashes
>>         because of a
>>         > memory allocation error. If I don't use itsdb and just go
>>         with LKB
>>         > batch parsing, it is more robust as it can catch the error and
>>         > continue parsing, having reported a failure on the
>>         problematic item,
>>         > but the problem is still there and the parses still fail.
>>         >
>>         > I am a fairly inexperienced user of both systems, so right
>>         now I am
>>         > trying to understand what is the best way for me to:
>>         >
>>         >  1) debug the grammar with respect to the problem, i.e.
>>         what is it
>>         > about the grammar exactly that causes the issues;
>>         > 2) do something with itsdb so that perhaps this does not
>>         happen? Limit
>>         > it somehow so that it doesn't try as much?
>>         >
>>         > Currently I am mostly just trying to filter out the problematic
>>         > items... I also tried limiting the chart size to 30K, and
>>         that seems
>>         > to have helped a little, but the crashes still happen on
>>         some items.
>>         > If I limit the chart size to 20K, then it seems like maybe
>>         I can go
>>         > through the test suite, but then my coverage suffers when I
>>         think it
>>         > shouldn't: I think there are items which I can parse with
>>         30K limit
>>         > but not 20K... Is this the route I should be going in any
>>         case? Just
>>         > optimizing for the chart size?.. Maybe 25K is my number :).
>>         The chart
>>         > is the parse chart, is that correct? I need to understand
>>         what exactly
>>         > makes the chart so huge in my case; how should I approach
>>         debugging
>>         > that?..
>>         >
>>         > One specific question: what does "tenuring" mean with
>>         respect to
>>         > garbage collection? Google doesn't know (nor does the
>>         manual, I think).
>>         >
>>         > Does anyone have any comment on any of these issues? The (very
>>         > helpful) chapter on errors and debugging in Copestake
>>         (2002) book
>>         > mostly talks about other types of issues such as type
>>         loading problems
>>         > etc.. I also looked at what I found in ItsdbTop
>>         > (http://moin.delph-in.net/ItsdbTop), and it does mention
>>         that on
>>         > 32-bit systems memory problems are possible, but I think
>>         that note has
>>         > to do with treebanking, and it doesn't really tell me much
>>         about what
>>         > I should try in my case... I also looked thorough the itsdb
>>         manual
>>         > (http://www.delph-in.net/itsdb/publications/manual.pdf) --
>>         but it
>>         > looks like some of the sections, specifically about
>>         debugging and
>>         > options and parameters, are empty?
>>         >
>>         > Anyway, I would greatly appreciate any advice! I attach a
>>         picture of a
>>         > running testsuite processing, to give an idea about the
>>         memory usage
>>         > and the chart size, and of the error. It is possible that
>>         the grammar
>>         > that I have is just not a usage scenario as far as itsdb is
>>         concerned,
>>         > but I don't yet have a clear understanding of whether
>>         that's the case.
>>         >
>>         > Thanks!
>>         > Olga

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20160409/e07c9b5b/attachment.html>

More information about the developers mailing list