<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    OK, I was in fact going to ask whether there was a way of combining
    the rules so you didn't have so many of them.  It's not the type of
    thing the system is designed to cope with, to be honest.  <br>
    <br>
    All best,<br>
    <br>
    Ann<br>
    <br>
    <div class="moz-cite-prefix">On 09/04/2016 17:45, Olga Zamaraeva
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAKFA0BUf9ohj4oae5Z1G5fkwDyZK+xrubWoDEz3Q0JEsRCCNWw@mail.gmail.com"
      type="cite">
      <div dir="ltr">Dear Ann,
        <div><br>
        </div>
        <div>You know, no, I did not find a work around yet, but I
          decided to leave it alone for a little while. It only happens
          with two of my grammars, and they are just unrealistically
          large, there really should not be so many lexical rule
          instances with the same orthography. Not dozens of them, I
          don't think. So I decided for now that it is not very
          important that I cannot use those two grammars for parsing. I
          sort of evaluated them on a smaller test set, and I think this
          will do for now.</div>
        <div><br>
        </div>
        <div>Thank you!</div>
        <div>Olga</div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr">On Sat, Apr 9, 2016 at 7:32 AM Ann Copestake &lt;<a
            moz-do-not-send="true" href="mailto:aac10@cl.cam.ac.uk"><a class="moz-txt-link-abbreviated" href="mailto:aac10@cl.cam.ac.uk">aac10@cl.cam.ac.uk</a></a>&gt;
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0 0 0
          .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div bgcolor="#FFFFFF" text="#000000"> Dear Olga,<br>
            <br>
            Sorry for not replying - did you manage to find a work
            around?  <br>
          </div>
          <div bgcolor="#FFFFFF" text="#000000"> <br>
            Ann</div>
          <div bgcolor="#FFFFFF" text="#000000"><br>
            <br>
            <br>
            <div>On 24/03/2016 21:55, Olga Zamaraeva wrote:<br>
            </div>
            <blockquote type="cite">
              <div dir="ltr">Looking at the token chart, I see that I in
                fact have many lexical rules for the same orthography,
                and that results in too many parses (a snippet of the
                output is attached). For example, for this "a-tis-e"
                input that I am trying, there are dozens lexical rules
                for a- and dozens for -a, and so the combinations are
                too many. The reason for this is because the rules were
                inferred automatically by a clustering algorithm, and I
                asked for many clusters (there is a reason for asking
                for many clusters also: I am trying to compare these
                results with another algorithm which happened to infer
                many position classes, so I want my clustering to come
                up with the same number and then to compare two
                grammars).
                <div><br>
                </div>
                <div>Is there a way to have the LKB stop after it found
                  a parse, and not try other possibilities? I tried
                  doing that in itsdb (by turning off exhaustive search
                  and limiting the maximum number of analyses), but it
                  still cannot handle this large grammar for some
                  reason...</div>
                <div><br>
                </div>
                <div>Thank you!</div>
                <div>Olga</div>
              </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr">On Thu, Mar 17, 2016 at 1:34 AM Ann
                  Copestake &lt;<a moz-do-not-send="true"
                    href="mailto:aac10@cl.cam.ac.uk" target="_blank">aac10@cl.cam.ac.uk</a>&gt;

                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <div bgcolor="#FFFFFF" text="#000000"> So the process
                    is running out of memory before hitting the limit on
                    the number of chart edges, which stops processing a
                    little more gracefully.  The LKB batch parse process
                    catches some errors in a way that allows the rest of
                    the batch to continue.   It may be that all that's
                    happening is that the chart edge limit was set too
                    high relative to the available memory, although it
                    is possible that memory is being used in a way that
                    isn't reflected by the edge limit, which is why I
                    suggested also looking at the token chart.   You
                    could increase the amount of memory available to the
                    process and see whether you can get your test set
                    through, but unless that's the final test set and
                    you don't intend to work on any more complex
                    examples than the ones you have, that's only going
                    to be a temporary measure.<br>
                    <br>
                    I don't think it will matter whether you look at
                    examples that can eventually be parsed, something
                    that fails after a huge number of edges or something
                    that causes the memory crash - your task is to find
                    out whether there is something you can do to cut
                    down the number of rule applications.  The good news
                    is that you won't need to find many cases of
                    over-application to make a dramatic improvement. I
                    think you will see the issues with the grammar when
                    you look at a chart, even with a small edge limit.</div>
                  <div bgcolor="#FFFFFF" text="#000000"><br>
                    <br>
                    Ann</div>
                  <div bgcolor="#FFFFFF" text="#000000"><br>
                    <br>
                    <div>On 17/03/2016 01:55, Olga Zamaraeva wrote:<br>
                    </div>
                    <blockquote type="cite">
                      <div dir="ltr">Thank you Ann!
                        <div><br>
                        </div>
                        <div>I suppose I should try to pin down an input
                          that can be successfully parsed, but does
                          produce a huge chart. Of course my most
                          pressing problem is not that some inputs are
                          parsed with huge charts but that some inputs
                          can never be parsed and break the system. But
                          perhaps this is caused by the same problem (or
                          feature) in the grammar.</div>
                        <div><br>
                        </div>
                        <div>The LKB does give an error message, the
                          same memory allocation error that comes
                          through itsdb when that breaks (attached in
                          the original email).<br>
                          <br>
                          Olga<br>
                          <div class="gmail_quote">
                            <div dir="ltr">On Tue, Mar 15, 2016 at 2:19
                              PM Ann Copestake &lt;<a
                                moz-do-not-send="true"
                                href="mailto:aac10@cl.cam.ac.uk"
                                target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:aac10@cl.cam.ac.uk">aac10@cl.cam.ac.uk</a></a>&gt;


                              wrote:<br>
                            </div>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex">I would say
                              that you should attempt to debug in the
                              LKB.  I don't know<br>
                              exactly why [incr() tsdb] crashes while
                              the LKB batch fails more<br>
                              gracefully (does the LKB give an error
                              message?) but you should try and<br>
                              understand what's going on to give you
                              such a huge chart. That's not to<br>
                              say that it wouldn't be a good idea to
                              know what the [incr() tsdb] issue<br>
                              is, but it probably won't help you much
                              ...<br>
                              <br>
                              If you're using the LKB's morphophonology,
                              you might want to look at the<br>
                              token chart as well as the parse chart. 
                              This is more recent than the<br>
                              book, so isn't documented, but if you have
                              an expanded menu, I think it<br>
                              shows up under Debug.  You want the `print
                              token chart' item, which will<br>
                              output to the emacs window.  Similarly, if
                              you're trying to debug what's<br>
                              going on and have an enormous parse chart,
                              don't try and look at the<br>
                              chart in a window, but use the `print
                              chart' option.  You would want to<br>
                              reduce the maximum number of items to
                              something a lot smaller than 20k<br>
                              before you try that, though.<br>
                              <br>
                              We should have a FAQ that says `ignore all
                              the GC messages'.  It's<br>
                              really just a symptom of the underlying
                              system running out of space -<br>
                              nothing to do with the LKB or [incr()
                              tsdb] as such.  So there's not a<br>
                              lot of enlightenment to be gained by
                              understanding terms like tenuring ...<br>
                              <br>
                              Best,<br>
                              <br>
                              Ann<br>
                              <br>
                              On 15/03/2016 19:55, Olga Zamaraeva wrote:<br>
                              &gt; Dear developers!<br>
                              &gt;<br>
                              &gt; I am trying to use the LKB and
                              [incr() tsdb] to parse a list of verbs<br>
                              &gt; by a grammar of Chintang [ctn]. The
                              language is polysynthetic, plus<br>
                              &gt; the grammar was created automatically
                              using k-means clustering for the<br>
                              &gt; morphology section, so some of the
                              position classes have lots and lots<br>
                              &gt; of inputs and lots and lots of
                              lexical rule types and instances.<br>
                              &gt;<br>
                              &gt; I am running into a problem when 
                              [incr() tsdb] crashes because of a<br>
                              &gt; memory allocation error. If I don't
                              use itsdb and just go with LKB<br>
                              &gt; batch parsing, it is more robust as
                              it can catch the error and<br>
                              &gt; continue parsing, having reported a
                              failure on the problematic item,<br>
                              &gt; but the problem is still there and
                              the parses still fail.<br>
                              &gt;<br>
                              &gt; I am a fairly inexperienced user of
                              both systems, so right now I am<br>
                              &gt; trying to understand what is the best
                              way for me to:<br>
                              &gt;<br>
                              &gt;  1) debug the grammar with respect to
                              the problem, i.e. what is it<br>
                              &gt; about the grammar exactly that causes
                              the issues;<br>
                              &gt; 2) do something with itsdb so that
                              perhaps this does not happen? Limit<br>
                              &gt; it somehow so that it doesn't try as
                              much?<br>
                              &gt;<br>
                              &gt; Currently I am mostly just trying to
                              filter out the problematic<br>
                              &gt; items... I also tried limiting the
                              chart size to 30K, and that seems<br>
                              &gt; to have helped a little, but the
                              crashes still happen on some items.<br>
                              &gt; If I limit the chart size to 20K,
                              then it seems like maybe I can go<br>
                              &gt; through the test suite, but then my
                              coverage suffers when I think it<br>
                              &gt; shouldn't: I think there are items
                              which I can parse with 30K limit<br>
                              &gt; but not 20K... Is this the route I
                              should be going in any case? Just<br>
                              &gt; optimizing for the chart size?..
                              Maybe 25K is my number :). The chart<br>
                              &gt; is the parse chart, is that correct?
                              I need to understand what exactly<br>
                              &gt; makes the chart so huge in my case;
                              how should I approach debugging<br>
                              &gt; that?..<br>
                              &gt;<br>
                              &gt; One specific question: what does
                              "tenuring" mean with respect to<br>
                              &gt; garbage collection? Google doesn't
                              know (nor does the manual, I think).<br>
                              &gt;<br>
                              &gt; Does anyone have any comment on any
                              of these issues? The (very<br>
                              &gt; helpful) chapter on errors and
                              debugging in Copestake (2002) book<br>
                              &gt; mostly talks about other types of
                              issues such as type loading problems<br>
                              &gt; etc.. I also looked at what I found
                              in ItsdbTop<br>
                              &gt; (<a moz-do-not-send="true"
                                href="http://moin.delph-in.net/ItsdbTop"
                                rel="noreferrer" target="_blank">http://moin.delph-in.net/ItsdbTop</a>),


                              and it does mention that on<br>
                              &gt; 32-bit systems memory problems are
                              possible, but I think that note has<br>
                              &gt; to do with treebanking, and it
                              doesn't really tell me much about what<br>
                              &gt; I should try in my case... I also
                              looked thorough the itsdb manual<br>
                              &gt; (<a moz-do-not-send="true"
                                href="http://www.delph-in.net/itsdb/publications/manual.pdf"
                                rel="noreferrer" target="_blank">http://www.delph-in.net/itsdb/publications/manual.pdf</a>)
                              -- but it<br>
                              &gt; looks like some of the sections,
                              specifically about debugging and<br>
                              &gt; options and parameters, are empty?<br>
                              &gt;<br>
                              &gt; Anyway, I would greatly appreciate
                              any advice! I attach a picture of a<br>
                              &gt; running testsuite processing, to give
                              an idea about the memory usage<br>
                              &gt; and the chart size, and of the error.
                              It is possible that the grammar<br>
                              &gt; that I have is just not a usage
                              scenario as far as itsdb is concerned,<br>
                              &gt; but I don't yet have a clear
                              understanding of whether that's the case.<br>
                              &gt;<br>
                              &gt; Thanks!<br>
                              &gt; Olga<br>
                              <br>
                            </blockquote>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                    <br>
                  </div>
                </blockquote>
              </div>
            </blockquote>
            <br>
          </div>
        </blockquote>
      </div>
    </blockquote>
    <br>
  </body>
</html>