[developers] parsers & performance

Thu Oct 22 22:56:53 CEST 2015

nurit,

> Can anyone point me to citable information about the efficiency of HPSG
> parsers and how it has improved over the years?

that is an interesting question :-).  exact figures may be non-trivial
to come by, and a lot will depend on choices in set-up, e.g. the
specific grammar(s) and test data assumed.

i believe the oldest such figures we have available are from the
mid-1990s, comparing the PAGE, LKB, and PET parsers, using different
ERG versions and a smallish sample of Verbmobil inputs dubbed ‘aged’.
Oepen & Callmeier (2002;
http://link.springer.com/chapter/10.1007%2F1-4020-2295-6_19) report
36.7 seconds per input for PAGE in mid-1996 and 0.14 seconds for PET
in mid-1999 (on a 300-megahertz ultrasparc server).  taking into
account increased grammatical coverage between the two measurements,
we conjectured an effective speed-up of three orders of magnitude; see
Section 4 in the first attachment below.  the main technological
innovations over that period, i would say, were greatly improved
unifier throughput, avoidance of unnecessary unifications, parsing
strategy optimizations, and careful engineering (moving from Lisp to
C++).

for all i recall, additional major speed-ups were later obtained from
the introduction of (a) subsumption-based ambiguity packing, (b)
n-best selective unpacking, and (c) ubertagging.  the former two were
originally implemented in the LKB and later imported into PET;
ubertagging was first developed in PET and recently integrated with
ACE.  unfortunately, i doubt the (relatively simple) ‘aged’ data is
complex enough to fully make visible these advances, and a
head-to-head comparison to the mid-1990s numbers seems impossible for
others reasons anyway.  as i parse ‘aged' today (on my 2-gigahertz
laptop) using PET and ACE, i see average parse times per input of 0.23
and 0.15 seconds, respectively.  comparing to Table 1.5 of that old
paper, i estimate that enlarged grammatical coverage has increased the
search space by a factor of about twenty: there are on average 137
readings per input now, compared to seven back then.

as for published figures on improvements since around 2000, Oepen &
Carroll (2000; https://aclweb.org/anthology/A/A00/A00-2022.pdf) report
a speed-up of up to a factor of 11 (in the LKB implementation) for
subsumption-based packing, using a larger sample of Verbmobil
sentences with substantially higher average length (called ‘blend’);
please see Table 2 in the above paper.  i seem to recall that one
order of magnitude speed-up (on sufficiently complex inputs) was
preserved when importing this technique into PET, but i cannot easily
think of a publication for that result.

as for the introduction of selective unpacking, Zhang et al. (2007;
https://aclweb.org/anthology/W/W07/W07-2207.pdf) report a speed-up by
a little more than a factor of two, on the JH4 section of the LOGON
corpus (see Table 2 and Figure 4), but much like for ambiguity packing
the relative benefits of selective unpacking would be amplified by
more complex inputs, for example the venerable WSJ text.

up to this point, the methods referred to above maintain what one
might call the ‘exact inference’ property of parsing with DELPH-IN
grammars, i.e. the parser will consider the full search space
(constructing a complete forest) and guarantee retrieval of the
globally top-ranked n-best trees from the forest.  with the addition
of lexical pruning based on ubertagging, Dridan (2013;
https://aclweb.org/anthology/D/D13/D13-1120.pdf) introduces more
approximative techniques into our tool chain—as have long been common
in almost all statistical parsing work and systems like Enju and
C&C—and, thus, there are more choices and trade-offs to consider:
giving up various degrees of accuracy or coverage (or both), in
principle, one can make a parser arbitrarily fast :-).

for the configuration that Dridan (2013) considers a good balance
along these dimensions, she reports speed-ups of around a factor of
three (on WSJ, CB, and WeScience data), while maintaining or
moderately improving accuracy and coverage (see Table 5).  still,
ubertagging has not quite turned into the ‘standard’ parsing set-up
for the ERG or other DELPH-IN grammars, for all i can see—possibly
because of the extra model preparation overhead, possibly because of
its approximative nature.  when in doubt, i would recommend its use in
efficiency-critical settings.

finally, PET and ACE implement the same general inventory of
techniques (including all mentioned above), but i believe ACE has seen
some additional optimization of data structures and critical code, as
well as several refinements to hyper-active parsing and ambiguity
packing; to date, i think there is no summary publication of these
improvements and associated efficiency gains.  i shall let woodley
speak more for ACE, but i would generally expect it to be somewhat
faster (and possibly a little more memory-demanding) than PET, maybe
on average up to a factor of two.

—so, in summary, DELPH-IN parsers to date may be about 1000 x 10 x 2 x
3 x 2 times faster than the mid-1990s PAGE system, or about five
orders of magnitude, not taking into account hardware evolution :-).
at the same time, our grammars have grown much bigger and more
complex, and for the ERG at least i would personally say that current
parsing efficiency is sufficient for many tasks but inadequate for
others, in particular ‘web-scale’ applications.  we used to feel
‘slow’ compared to, say, the Charniak & Johnson parser ten or fifteen
years ago, and i would say we have caught up to C&J (while still
solving a deeply more interesting problem, in my view).  but in
comparison to Enju, recent CCG work, or a system like TurboParser (or
what we know about the dependency parser used internally at Google), i
am afraid one would have to consider HPSG parsing comparatively
expensive, still.  you get what you pay for.

on this note, let me take the opportunity to share an unpublished
manuscript (from five years ago; see the second attachment), where i
tried to reflect on technology development in DELPH-IN since the
mid-1990s.  this was originally written for a festschrift volume
(which sadly never manifested), so please pardon the style.  while
writing this email (and also the draft festschrift contribution), i
worry that my subjective looking back is bound to omit important
evolutionary steps; there has been lots more work on efficiency and
robustness, and not all of it has been synthesized into the ‘standard’
DELPH-IN parsing tools.  thus, all, please correct and complement as
you see fit, and please forgive the brevity and haste of this message!

best wishes, oe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Oep:Cal:02.pdf
Type: application/pdf
Size: 151934 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20151022/927f357b/attachment-0002.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uszkoreit.pdf
Type: application/pdf
Size: 221045 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20151022/927f357b/attachment-0003.pdf>