<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""> Hi all, <div class=""><br class=""> </div> <div class="">Glenn and I have been exchanging emails off-list about implementation issues in GLB computation and speeding up logical operations on sparse bit vectors. We have also run <i class="">agree</i> and the new version of the LKB on Petter’s whole grammar (see his posting to the list on 26 August). For the benefit of Woodley, Ann and anyone else who’s interested, I append a few excerpts.</div> <div class=""><br class=""> </div> <div class="">John</div> <div class=""><br class=""> </div> <div class=""><br class=""> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> </div> <blockquote type="cite" class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> As I mentioned in my previous note, given a summary qword, I use (x & -x) to directly access, in O(N) of the summary 1 bits, only the interesting qwords. Given such a single-bit mask, I use the 64-bit deBruijn number “0x07EDD5E59A4E28C2” to find its log, so as to index into the main qwords. The code is hairy, but I seem to recall that testing showed large speedups. ... <o:p class=""></o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> Glenn<o:p class=""></o:p></div> </blockquote> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""><br class=""> </o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> </div> </div> <blockquote type="cite" class=""> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <span style="font-size: 11pt;" class="">Looks like the</span><span style="font-size: 11pt;" class=""> </span><i style="font-size: 11pt;" class="">agree</i><span style="font-size: 11pt;" class=""> </span><span style="font-size: 11pt;" class="">results are similar to yours, with a dramatic speedup for computing the glb closure of the full Norwegian type hierarchy that Petter sent (norsyg.2018-08-26.tgz) in about a minute and a half:</span></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""></o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> 00:00:00 iter 0, glbs: 4943<o:p class=""></o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> 00:00:31 iter 1, glbs: 13061<o:p class=""></o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> 00:01:15 iter 2, glbs: 7516<o:p class=""></o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> 00:01:28 iter 3, glbs: 374<o:p class=""></o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> 00:01:28 types:63251 glbs:20951<o:p class=""></o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> Best regards,<o:p class=""></o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> Glenn</div> </div> </blockquote> <div class=""><br class=""> </div> <div class=""><br class=""> </div> <div class=""> <div class=""></div> </div> <blockquote type="cite" class=""> <div class=""> <div class="">Here are my results for loading Petter’s norsyg.2018-08-26.tgz with the latest LKB:</div> <div class=""><br class=""> </div> <div class="">  grammar        largest partition   time</div> <div class="">  tiny-script    736 types           3 secs</div> <div class="">  small-script   4297 types          17 secs</div> <div class="">  script         40658 types         4 mins 50 secs</div> <div class=""><br class=""> </div> <div class="">For the LKB, the most expensive operation is not computing the glb types but finding the correct place to insert them into the type hierarchy. I’m sure this could be improved.</div> <div class=""><br class=""> </div> <div class="">John</div> </div> </blockquote> <div class=""><br class=""> </div> <div class=""><br class=""> </div> <div class=""><br class=""> <div> <blockquote type="cite" class=""> <div class="">On 28 Aug 2017, at 20:33, Glenn Slayden <<a href="mailto:glenn@thai-language.com" class="">glenn@thai-language.com</a>> wrote:</div> <br class="Apple-interchange-newline"> <div class=""> <div class="WordSection1" style="page: WordSection1; font-family: Courier; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;"> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> Thanks John,<o:p class=""></o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> Thanks for clarifying how you use wrap-around with your 192-bit scheme to indicate the areas with signal. The reason for my open-ended (auto-expanding) 1:64 ratio system was that I wanted the bitarray implementation to be a general-purpose component that could possibly be used in other sparse-representation applications. Hence also the correct maintaining of the summary bits during arbitrary bitwise operations (indeed some of which they helpfully inform).<o:p class=""></o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> I have received your files and will try to load them and report performance figures soon.<o:p class=""></o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> Best regards,<o:p class=""></o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> GLenn<o:p class=""></o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> <div class=""> <div style="border-style: solid none none; border-top-width: 1pt; border-top-color: rgb(225, 225, 225); padding: 3pt 0in 0in;" class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <b class="">From:</b><span class="Apple-converted-space"> </span>John Carroll [<a href="mailto:J.A.Carroll@sussex.ac.uk" class="">mailto:J.A.Carroll@sussex.ac.uk</a>]<span class="Apple-converted-space"> </span><br class=""> <b class="">Sent:</b><span class="Apple-converted-space"> </span>Thursday, August 24, 2017 5:42 AM<br class=""> <b class="">To:</b><span class="Apple-converted-space"> </span><a href="mailto:developers@delph-in.net" class="">developers@delph-in.net</a><br class=""> <b class="">Cc:</b><span class="Apple-converted-space"> </span>Glenn Slayden <glenn@thai-language.com>; gslayden@uw.edu<br class=""> <b class="">Subject:</b><span class="Apple-converted-space"> </span>Re: [developers] speeding up grammar loading in the LKB<o:p class=""></o:p></div> </div> </div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> Hi Glenn,<span class="Apple-converted-space"> </span><o:p class=""></o:p></div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> A quick follow-up: I like your idea of the summary bits potentially allowing large uninteresting segments of the full bit vectors to be skipped.<o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> I use 192 summary bits (= 3 x 64 bit words) since in my experiments using more bits didn’t give a significant improvement. Although a fixed-size summary representation doesn’t unambiguously identify those words in the full bit vector that are zero, it allows the compiler to unwind loops of logical operations and efficiently inline them.<o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> I’ll be interested in your results with the type files I sent you. (To get the graph I produced cut-down versions by removing final segments of the verbrels.tdl file).<o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> John<o:p class=""></o:p></div> <p class="MsoNormal" style="margin: 0in 0in 12pt; font-size: 11pt; font-family: Calibri, sans-serif;"> <o:p class=""> </o:p></p> <div class=""> <blockquote style="margin-top: 5pt; margin-bottom: 5pt;" class="" type="cite"> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> On 22 Aug 2017, at 23:26, John Carroll <<a href="mailto:J.A.Carroll@sussex.ac.uk" style="color: purple; text-decoration: underline;" class="">J.A.Carroll@sussex.ac.uk</a>> wrote:<o:p class=""></o:p></div> </div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> <div class=""> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> Hi Glenn,<span class="Apple-converted-space"> </span><o:p class=""></o:p></div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> I think my scheme is very similar to yours. Each successive bit in my 192-bit “summary” representation encodes whether the next 64 bits of the full representation has any 1s in it. On reaching the end of the 192 bits, it starts again (so bit zero of the summary also encodes the 193rd group of 64 bits, etc).<o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> I attach the type files. They should be loaded in the following order:<o:p class=""></o:p></div> </div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <br class="">  coretypes.tdl<br class="">  extratypes.tdl<br class="">  linktypes.tdl<br class="">  verbrels.tdl<br class="">  reltypes.tdl<o:p class=""></o:p></div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> John<o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> <div class=""> <blockquote style="margin-top: 5pt; margin-bottom: 5pt;" class="" type="cite"> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> On 22 Aug 2017, at 22:57, Glenn Slayden <<a href="mailto:glenn@thai-language.com" style="color: purple; text-decoration: underline;" class="">glenn@thai-language.com</a>> wrote:<o:p class=""></o:p></div> </div> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <o:p class=""> </o:p></div> <div class=""> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> Hello All,<o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">  <o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> I apologize for not communicating this earlier, but since 2009 Agree has used a similar approach of carrying and maintaining supplemental bits, which I call “summary” bits, along with each of the large bit vectors for use during the GLB computation. Instead of a fixed 192 bits, Agree uses one “summary” bit per 64-bit ‘qword’ of main bits, where summary bits are all stored together (allocated in chunks of 64). Each individual summary bit indicates whether it’s corresponding full qword has any 1s in it and is correctly maintained across all logical operations.<span class="apple-converted-space"> </span><o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">  <o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> In the best case of an extreme sparse vector, finding that one summary qword is zero avoids evaluating 4096 bits. More realistically, however, it’s possible to walk the summary bits in O(number-of-set-bits), and this provides direct access to (only) those 64-bit qwords that are interesting.<o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">  <o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> I would welcome the chance to test Petter’s grammar in Agree.<o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">  <o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> Best regards,<o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">  <o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> Glenn<o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">  <o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">  <o:p class=""></o:p></div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">  <o:p class=""></o:p></div> </div> <div class=""> <div style="border-style: solid none none; border-top-width: 1pt; border-top-color: rgb(225, 225, 225); padding: 3pt 0in 0in;" class=""> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""> <b class="">From:</b><span class="apple-converted-space"> </span><a href="mailto:developers-bounces@emmtee.net" style="color: purple; text-decoration: underline;" class="">developers-bounces@emmtee.net</a><span class="Apple-converted-space"> </span>[<a href="mailto:developers-bounces@emmtee.net" style="color: purple; text-decoration: underline;" class="">mailto:developers-bounces@emmtee.net</a>]<span class="apple-converted-space"> </span><b class="">On Behalf Of<span class="apple-converted-space"> </span></b>John Carroll<br class=""> <b class="">Sent:</b><span class="apple-converted-space"> </span>Friday, August 18, 2017 3:26 PM<br class=""> <b class="">To:</b><span class="apple-converted-space"> </span><a href="mailto:developers@delph-in.net" style="color: purple; text-decoration: underline;" class="">developers@delph-in.net</a><br class=""> <b class="">Subject:</b><span class="apple-converted-space"> </span>Re: [developers] speeding up grammar loading in the LKB<o:p class=""></o:p></div> </div> </div> </div> <div class=""> <div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">  <o:p class=""></o:p></div> </div> <div class=""> <div class=""> <p class="MsoNormal" style="margin: 0in 0in 12pt; font-size: 11pt; font-family: Calibri, sans-serif;"> <span style="font-size: 10pt;" class="">Hi,<br class=""> <br class=""> [This is a more detailed follow-up to emails that Petter, Stephan, Woodley and I have been exchanging over the past couple of days]<br class=""> <br class=""> At the very pleasant DELPH-IN Summit last week in Oslo, Petter mentioned to me that the full version of his Norwegian grammar takes hours to load into the LKB. He gave me some of his grammar files, and it turns out that the time is going in computing glb types for a partition of the type hierarchy that contains almost all the types. In this example grammar, there is a partition of almost 40000 types which cannot be split into smaller disjoint sets of non-interacting types. The LKB was having to consider 40000^2/2 (= 800 million) type/type combinations, each combination taking time linear in the number of types. Although this is an efficiently coded part of the LKB, the computation still took around 30 minutes.<br class=""> <br class=""> One fortunate property of the glb type computation algorithm is that very few of the type/type combinations are “interesting” in that they actually lead to creation of a glb. So I came up with a scheme to quickly filter out pairs that could not possibly produce glbs (always erring on the permissive side in order not to make the algorithm incorrect).<br class=""> <br class=""> In this scheme, the bit code representing each type (40000 bits long for this example grammar) is augmented with a relatively short “type filter” code (192 bits empirically gives good results). The two main operations in computing glb types are ANDing pairs of these bit codes and testing whether the result is all zeros, and determining whether one bit code subsumes another (for every zero bit in code 1, the corresponding bit in code 2 must also be zero). By making each bit of a filter code to be the logical OR of a specific set of bits in the corresponding type bit code, the AND and subsume tests can also be applied to these codes as a quick pre-filter.<br class=""> <br class=""> This approach reduces the load time for Petter's example grammar from 30 minutes to 4.5 minutes. It also seems to make the computation scale a bit more happily with increasing numbers of types. I attach a graph showing a comparison for this grammar and cut-down versions.<br class=""> <br class=""> So that grammar writers can benefit soon, Stephan will shortly re-build the LOGON image to include this new algorithm.<br class=""> <br class=""> John</span></p> </div> </div> </div> </blockquote> </div> </div> </div> </div> </blockquote> </div> </div> </div> </div> </blockquote> </div> <br class=""> </div> </body> </html>