[developers] parsers & performance

Fri Oct 23 07:21:49 CEST 2015

Thanks you so much, Stephan, for all this interesting and helpful information.

Nurit

-----Original Message-----
From: stephan.oepen at gmail.com [mailto:stephan.oepen at gmail.com] On Behalf Of Stephan Oepen
Sent: Thursday, October 22, 2015 11:57 PM
To: Nurit Melnik
Cc: developers at delph-in.net
Subject: Re: [developers] parsers & performance

nurit,

> Can anyone point me to citable information about the efficiency of 
> HPSG parsers and how it has improved over the years?

that is an interesting question :-).  exact figures may be non-trivial to come by, and a lot will depend on choices in set-up, e.g. the specific grammar(s) and test data assumed.

i believe the oldest such figures we have available are from the mid-1990s, comparing the PAGE, LKB, and PET parsers, using different ERG versions and a smallish sample of Verbmobil inputs dubbed ‘aged’.
Oepen & Callmeier (2002;
http://link.springer.com/chapter/10.1007%2F1-4020-2295-6_19) report
36.7 seconds per input for PAGE in mid-1996 and 0.14 seconds for PET in mid-1999 (on a 300-megahertz ultrasparc server).  taking into account increased grammatical coverage between the two measurements, we conjectured an effective speed-up of three orders of magnitude; see Section 4 in the first attachment below.  the main technological innovations over that period, i would say, were greatly improved unifier throughput, avoidance of unnecessary unifications, parsing strategy optimizations, and careful engineering (moving from Lisp to
C++).

for all i recall, additional major speed-ups were later obtained from the introduction of (a) subsumption-based ambiguity packing, (b) n-best selective unpacking, and (c) ubertagging.  the former two were originally implemented in the LKB and later imported into PET; ubertagging was first developed in PET and recently integrated with ACE.  unfortunately, i doubt the (relatively simple) ‘aged’ data is complex enough to fully make visible these advances, and a head-to-head comparison to the mid-1990s numbers seems impossible for others reasons anyway.  as i parse ‘aged' today (on my 2-gigahertz
laptop) using PET and ACE, i see average parse times per input of 0.23 and 0.15 seconds, respectively.  comparing to Table 1.5 of that old paper, i estimate that enlarged grammatical coverage has increased the search space by a factor of about twenty: there are on average 137 readings per input now, compared to seven back then.

as for published figures on improvements since around 2000, Oepen & Carroll (2000; https://aclweb.org/anthology/A/A00/A00-2022.pdf) report a speed-up of up to a factor of 11 (in the LKB implementation) for subsumption-based packing, using a larger sample of Verbmobil sentences with substantially higher average length (called ‘blend’); please see Table 2 in the above paper.  i seem to recall that one order of magnitude speed-up (on sufficiently complex inputs) was preserved when importing this technique into PET, but i cannot easily think of a publication for that result.

as for the introduction of selective unpacking, Zhang et al. (2007;
https://aclweb.org/anthology/W/W07/W07-2207.pdf) report a speed-up by a little more than a factor of two, on the JH4 section of the LOGON corpus (see Table 2 and Figure 4), but much like for ambiguity packing the relative benefits of selective unpacking would be amplified by more complex inputs, for example the venerable WSJ text.

up to this point, the methods referred to above maintain what one might call the ‘exact inference’ property of parsing with DELPH-IN grammars, i.e. the parser will consider the full search space (constructing a complete forest) and guarantee retrieval of the globally top-ranked n-best trees from the forest.  with the addition of lexical pruning based on ubertagging, Dridan (2013;
https://aclweb.org/anthology/D/D13/D13-1120.pdf) introduces more approximative techniques into our tool chain—as have long been common in almost all statistical parsing work and systems like Enju and C&C—and, thus, there are more choices and trade-offs to consider:
giving up various degrees of accuracy or coverage (or both), in principle, one can make a parser arbitrarily fast :-).

for the configuration that Dridan (2013) considers a good balance along these dimensions, she reports speed-ups of around a factor of three (on WSJ, CB, and WeScience data), while maintaining or moderately improving accuracy and coverage (see Table 5).  still, ubertagging has not quite turned into the ‘standard’ parsing set-up for the ERG or other DELPH-IN grammars, for all i can see—possibly because of the extra model preparation overhead, possibly because of its approximative nature.  when in doubt, i would recommend its use in efficiency-critical settings.

finally, PET and ACE implement the same general inventory of techniques (including all mentioned above), but i believe ACE has seen some additional optimization of data structures and critical code, as well as several refinements to hyper-active parsing and ambiguity packing; to date, i think there is no summary publication of these improvements and associated efficiency gains.  i shall let woodley speak more for ACE, but i would generally expect it to be somewhat faster (and possibly a little more memory-demanding) than PET, maybe on average up to a factor of two.

—so, in summary, DELPH-IN parsers to date may be about 1000 x 10 x 2 x
3 x 2 times faster than the mid-1990s PAGE system, or about five orders of magnitude, not taking into account hardware evolution :-).
at the same time, our grammars have grown much bigger and more complex, and for the ERG at least i would personally say that current parsing efficiency is sufficient for many tasks but inadequate for others, in particular ‘web-scale’ applications.  we used to feel ‘slow’ compared to, say, the Charniak & Johnson parser ten or fifteen years ago, and i would say we have caught up to C&J (while still solving a deeply more interesting problem, in my view).  but in comparison to Enju, recent CCG work, or a system like TurboParser (or what we know about the dependency parser used internally at Google), i am afraid one would have to consider HPSG parsing comparatively expensive, still.  you get what you pay for.

on this note, let me take the opportunity to share an unpublished manuscript (from five years ago; see the second attachment), where i tried to reflect on technology development in DELPH-IN since the mid-1990s.  this was originally written for a festschrift volume (which sadly never manifested), so please pardon the style.  while writing this email (and also the draft festschrift contribution), i worry that my subjective looking back is bound to omit important evolutionary steps; there has been lots more work on efficiency and robustness, and not all of it has been synthesized into the ‘standard’
DELPH-IN parsing tools.  thus, all, please correct and complement as you see fit, and please forgive the brevity and haste of this message!

best wishes, oe