[developers] speeding up grammar loading in the LKB

Petter Haugereid petterha at gmail.com
Sat Aug 26 17:51:49 CEST 2017


Hi everybody,

I apologize for not giving you my whole grammar earlier. I just had to tidy
up a little... You can download the grammar files from
https://www.dropbox.com/s/irucxc313znfmd7/norsyg.2018-08-26.tgz?dl=0
The grammar now loads with both LKB (lkb/script) and ACE (ace/config.tdl).
The last time I loaded it with ACE, the checking of glb types took 4 hours.
I am curious to hear if it also loads with Agree. If you want to test with
smaller versions of the grammar, you can load it with lkb/small-script or
lkb/tiny-script, alternatively ace/config-small.tdl or ace/config-tiny.tdl.

Best,

Petter

On Thu, Aug 24, 2017 at 2:42 PM, John Carroll <J.A.Carroll at sussex.ac.uk>
wrote:

> Hi Glenn,
>
> A quick follow-up: I like your idea of the summary bits potentially
> allowing large uninteresting segments of the full bit vectors to be skipped.
>
> I use 192 summary bits (= 3 x 64 bit words) since in my experiments using
> more bits didn’t give a significant improvement. Although a fixed-size
> summary representation doesn’t unambiguously identify those words in the
> full bit vector that are zero, it allows the compiler to unwind loops of
> logical operations and efficiently inline them.
>
> I’ll be interested in your results with the type files I sent you. (To get
> the graph I produced cut-down versions by removing final segments of the
> verbrels.tdl file).
>
> John
>
>
> On 22 Aug 2017, at 23:26, John Carroll <J.A.Carroll at sussex.ac.uk> wrote:
>
> Hi Glenn,
>
> I think my scheme is very similar to yours. Each successive bit in my
> 192-bit “summary” representation encodes whether the next 64 bits of the
> full representation has any 1s in it. On reaching the end of the 192 bits,
> it starts again (so bit zero of the summary also encodes the 193rd group of
> 64 bits, etc).
>
> I attach the type files. They should be loaded in the following order:
>
>  coretypes.tdl
>  extratypes.tdl
>  linktypes.tdl
>  verbrels.tdl
>  reltypes.tdl
>
> John
>
>
> On 22 Aug 2017, at 22:57, Glenn Slayden <glenn at thai-language.com> wrote:
>
> Hello All,
>
> I apologize for not communicating this earlier, but since 2009 Agree has
> used a similar approach of carrying and maintaining supplemental bits,
> which I call “summary” bits, along with each of the large bit vectors for
> use during the GLB computation. Instead of a fixed 192 bits, Agree uses one
> “summary” bit per 64-bit ‘qword’ of main bits, where summary bits are all
> stored together (allocated in chunks of 64). Each individual summary bit
> indicates whether it’s corresponding full qword has any 1s in it and is
> correctly maintained across all logical operations.
>
> In the best case of an extreme sparse vector, finding that one summary
> qword is zero avoids evaluating 4096 bits. More realistically, however,
> it’s possible to walk the summary bits in O(number-of-set-bits), and this
> provides direct access to (only) those 64-bit qwords that are interesting.
>
> I would welcome the chance to test Petter’s grammar in Agree.
>
> Best regards,
>
> Glenn
>
>
>
> *From:* developers-bounces at emmtee.net [mailto:developers-bounces@
> emmtee.net <developers-bounces at emmtee.net>] *On Behalf Of *John Carroll
> *Sent:* Friday, August 18, 2017 3:26 PM
> *To:* developers at delph-in.net
> *Subject:* Re: [developers] speeding up grammar loading in the LKB
>
>
> Hi,
>
> [This is a more detailed follow-up to emails that Petter, Stephan, Woodley
> and I have been exchanging over the past couple of days]
>
> At the very pleasant DELPH-IN Summit last week in Oslo, Petter mentioned
> to me that the full version of his Norwegian grammar takes hours to load
> into the LKB. He gave me some of his grammar files, and it turns out that
> the time is going in computing glb types for a partition of the type
> hierarchy that contains almost all the types. In this example grammar,
> there is a partition of almost 40000 types which cannot be split into
> smaller disjoint sets of non-interacting types. The LKB was having to
> consider 40000^2/2 (= 800 million) type/type combinations, each combination
> taking time linear in the number of types. Although this is an efficiently
> coded part of the LKB, the computation still took around 30 minutes.
>
> One fortunate property of the glb type computation algorithm is that very
> few of the type/type combinations are “interesting” in that they actually
> lead to creation of a glb. So I came up with a scheme to quickly filter out
> pairs that could not possibly produce glbs (always erring on the permissive
> side in order not to make the algorithm incorrect).
>
> In this scheme, the bit code representing each type (40000 bits long for
> this example grammar) is augmented with a relatively short “type filter”
> code (192 bits empirically gives good results). The two main operations in
> computing glb types are ANDing pairs of these bit codes and testing whether
> the result is all zeros, and determining whether one bit code subsumes
> another (for every zero bit in code 1, the corresponding bit in code 2 must
> also be zero). By making each bit of a filter code to be the logical OR of
> a specific set of bits in the corresponding type bit code, the AND and
> subsume tests can also be applied to these codes as a quick pre-filter.
>
> This approach reduces the load time for Petter's example grammar from 30
> minutes to 4.5 minutes. It also seems to make the computation scale a bit
> more happily with increasing numbers of types. I attach a graph showing a
> comparison for this grammar and cut-down versions.
>
> So that grammar writers can benefit soon, Stephan will shortly re-build
> the LOGON image to include this new algorithm.
>
> John
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20170826/5e39c51f/attachment.html>


More information about the developers mailing list