[developers] speeding up grammar loading in the LKB

John Carroll J.A.Carroll at sussex.ac.uk
Sat Aug 19 00:26:07 CEST 2017


[This is a more detailed follow-up to emails that Petter, Stephan, Woodley and I have been exchanging over the past couple of days]

At the very pleasant DELPH-IN Summit last week in Oslo, Petter mentioned to me that the full version of his Norwegian grammar takes hours to load into the LKB. He gave me some of his grammar files, and it turns out that the time is going in computing glb types for a partition of the type hierarchy that contains almost all the types. In this example grammar, there is a partition of almost 40000 types which cannot be split into smaller disjoint sets of non-interacting types. The LKB was having to consider 40000^2/2 (= 800 million) type/type combinations, each combination taking time linear in the number of types. Although this is an efficiently coded part of the LKB, the computation still took around 30 minutes.

One fortunate property of the glb type computation algorithm is that very few of the type/type combinations are “interesting” in that they actually lead to creation of a glb. So I came up with a scheme to quickly filter out pairs that could not possibly produce glbs (always erring on the permissive side in order not to make the algorithm incorrect).

In this scheme, the bit code representing each type (40000 bits long for this example grammar) is augmented with a relatively short “type filter” code (192 bits empirically gives good results). The two main operations in computing glb types are ANDing pairs of these bit codes and testing whether the result is all zeros, and determining whether one bit code subsumes another (for every zero bit in code 1, the corresponding bit in code 2 must also be zero). By making each bit of a filter code to be the logical OR of a specific set of bits in the corresponding type bit code, the AND and subsume tests can also be applied to these codes as a quick pre-filter.

This approach reduces the load time for Petter's example grammar from 30 minutes to 4.5 minutes. It also seems to make the computation scale a bit more happily with increasing numbers of types. I attach a graph showing a comparison for this grammar and cut-down versions.

So that grammar writers can benefit soon, Stephan will shortly re-build the LOGON image to include this new algorithm.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20170818/65692f6f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-2.pdf
Type: application/pdf
Size: 29623 bytes
Desc: PastedGraphic-2.pdf
URL: <http://lists.delph-in.net/archives/developers/attachments/20170818/65692f6f/attachment-0001.pdf>

More information about the developers mailing list