[developers] [incr() tsdb]/LKB memory allocation error

Wed Mar 16 01:51:31 CET 2016

I'm not sure if you're using the UbuntuLKB VirtualBox appliance or not; if
you are, you may want to try increasing the amount of memory available to
the VM.  I think the default is only 1 GB for my recent builds, but most
modern computers have enough RAM to dedicate a lot more than that to the VM
and still get reasonable performance for the rest of the OS.

On Tue, Mar 15, 2016 at 12:55 PM, Olga Zamaraeva <olzama at uw.edu> wrote:

> Dear developers!
>
> I am trying to use the LKB and [incr() tsdb] to parse a list of verbs by a
> grammar of Chintang [ctn]. The language is polysynthetic, plus the grammar
> was created automatically using k-means clustering for the morphology
> section, so some of the position classes have lots and lots of inputs and
> lots and lots of lexical rule types and instances.
>
> I am running into a problem when  [incr() tsdb] crashes because of a
> memory allocation error. If I don't use itsdb and just go with LKB batch
> parsing, it is more robust as it can catch the error and continue parsing,
> having reported a failure on the problematic item, but the problem is still
> there and the parses still fail.
>
> I am a fairly inexperienced user of both systems, so right now I am trying
> to understand what is the best way for me to:
>
>  1) debug the grammar with respect to the problem, i.e. what is it about
> the grammar exactly that causes the issues;
> 2) do something with itsdb so that perhaps this does not happen? Limit it
> somehow so that it doesn't try as much?
>
> Currently I am mostly just trying to filter out the problematic items... I
> also tried limiting the chart size to 30K, and that seems to have helped a
> little, but the crashes still happen on some items. If I limit the chart
> size to 20K, then it seems like maybe I can go through the test suite, but
> then my coverage suffers when I think it shouldn't: I think there are items
> which I can parse with 30K limit but not 20K... Is this the route I should
> be going in any case? Just optimizing for the chart size?.. Maybe 25K is my
> number :). The chart is the parse chart, is that correct? I need to
> understand what exactly makes the chart so huge in my case; how should I
> approach debugging that?..
>
> One specific question: what does "tenuring" mean with respect to garbage
> collection? Google doesn't know (nor does the manual, I think).
>
> Does anyone have any comment on any of these issues? The (very helpful)
> chapter on errors and debugging in Copestake (2002) book mostly talks about
> other types of issues such as type loading problems etc.. I also looked at
> what I found in ItsdbTop (http://moin.delph-in.net/ItsdbTop), and it does
> mention that on 32-bit systems memory problems are possible, but I think
> that note has to do with treebanking, and it doesn't really tell me much
> about what I should try in my case... I also looked thorough the itsdb
> manual (http://www.delph-in.net/itsdb/publications/manual.pdf) -- but it
> looks like some of the sections, specifically about debugging and options
> and parameters, are empty?
>
> Anyway, I would greatly appreciate any advice! I attach a picture of a
> running testsuite processing, to give an idea about the memory usage and
> the chart size, and of the error. It is possible that the grammar that I
> have is just not a usage scenario as far as itsdb is concerned, but I don't
> yet have a clear understanding of whether that's the case.
>
> Thanks!
> Olga
>

-- 
D. Brodbeck
System Administrator, Linguistics
University of Washington
GPG key fingerprint: 0DB7 4B50 8910 DBC5 B510 79C4 3970 2BC3 2078 D875
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20160315/5d2ffec4/attachment.html>