<div dir="ltr">I'm not sure if you're using the UbuntuLKB VirtualBox appliance or not; if you are, you may want to try increasing the amount of memory available to the VM. I think the default is only 1 GB for my recent builds, but most modern computers have enough RAM to dedicate a lot more than that to the VM and still get reasonable performance for the rest of the OS.</div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 15, 2016 at 12:55 PM, Olga Zamaraeva <span dir="ltr"><<a href="mailto:olzama@uw.edu" target="_blank">olzama@uw.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Dear developers!<div><br></div><div>I am trying to use the LKB and [incr() tsdb] to parse a list of verbs by a grammar of Chintang [ctn]. The language is polysynthetic, plus the grammar was created automatically using k-means clustering for the morphology section, so some of the position classes have lots and lots of inputs and lots and lots of lexical rule types and instances.</div><div><br></div><div>I am running into a problem when [incr() tsdb] crashes because of a memory allocation error. If I don't use itsdb and just go with LKB batch parsing, it is more robust as it can catch the error and continue parsing, having reported a failure on the problematic item, but the problem is still there and the parses still fail.</div><div><br></div><div>I am a fairly inexperienced user of both systems, so right now I am trying to understand what is the best way for me to:</div><div><br></div><div> 1) debug the grammar with respect to the problem, i.e. what is it about the grammar exactly that causes the issues; </div><div>2) do something with itsdb so that perhaps this does not happen? Limit it somehow so that it doesn't try as much?</div><div><br></div><div>Currently I am mostly just trying to filter out the problematic items... I also tried limiting the chart size to 30K, and that seems to have helped a little, but the crashes still happen on some items. If I limit the chart size to 20K, then it seems like maybe I can go through the test suite, but then my coverage suffers when I think it shouldn't: I think there are items which I can parse with 30K limit but not 20K... Is this the route I should be going in any case? Just optimizing for the chart size?.. Maybe 25K is my number :). The chart is the parse chart, is that correct? I need to understand what exactly makes the chart so huge in my case; how should I approach debugging that?..</div><div><br></div><div>One specific question: what does "tenuring" mean with respect to garbage collection? Google doesn't know (nor does the manual, I think).</div><div><br></div><div>Does anyone have any comment on any of these issues? The (very helpful) chapter on errors and debugging in Copestake (2002) book mostly talks about other types of issues such as type loading problems etc.. I also looked at what I found in ItsdbTop (<a href="http://moin.delph-in.net/ItsdbTop" target="_blank">http://moin.delph-in.net/ItsdbTop</a>), and it does mention that on 32-bit systems memory problems are possible, but I think that note has to do with treebanking, and it doesn't really tell me much about what I should try in my case... I also looked thorough the itsdb manual (<a href="http://www.delph-in.net/itsdb/publications/manual.pdf" target="_blank">http://www.delph-in.net/itsdb/publications/manual.pdf</a>) -- but it looks like some of the sections, specifically about debugging and options and parameters, are empty?</div><div><br></div><div>Anyway, I would greatly appreciate any advice! I attach a picture of a running testsuite processing, to give an idea about the memory usage and the chart size, and of the error. It is possible that the grammar that I have is just not a usage scenario as far as itsdb is concerned, but I don't yet have a clear understanding of whether that's the case.</div><div><br></div><div>Thanks!</div><span class="HOEnZb"><font color="#888888"><div>Olga</div></font></span></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><span style="border-collapse:separate;text-indent:0px"><span style="border-collapse:separate;text-indent:0px"><div style="color:rgb(0,0,0);font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-transform:none;white-space:normal;word-spacing:0px">D. Brodbeck</div><div style="color:rgb(0,0,0);font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-transform:none;white-space:normal;word-spacing:0px">System Administrator, Linguistics</div><div style="color:rgb(0,0,0);font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-transform:none;white-space:normal;word-spacing:0px">University of Washington</div>GPG key fingerprint: 0DB7 4B50 8910 DBC5 B510 79C4 3970 2BC3 2078 D875</span></span><div><span style="border-collapse:separate;text-indent:0px"><span style="border-collapse:separate;text-indent:0px"><br></span></span></div></div></div>
</div>