[developers] processing of lexical rules

Tue May 3 11:31:39 CEST 2005

The change to allow for recursive application in irregular forms would be
very nice to have soon, since treating punctuation as suffixation means
that 'The dog slept.' has to apply the period-punctuation rule and then find
the irregular past form.  (I've added sleep/slept to the 'toy' grammar in
CVS using an irregs.tab file, for convenience.)

Allowing for MWE prefixation will be useful, but its lack isn't crippling
just now.

One other thing - the inventory of lexical rules in the ERG, when enriched
with the ten or so punctuation variants, makes the building of the lrfsm
very expensive, mostly because I have to allow for multiple occurrences of 
some 'pairing' punctuation marks like parens and quote marks, as in
    Kim said, "I met Sandy in Corvallis (that famous "city")".
If I leave in the rules for single quote, double quote, and right paren
as is, the lrfsm bulid can take several minutes; if I throw out one, the
time comes down to thirty seconds; if two out, then about three seconds,
and if none of these reapplying guys, then it's quick.  For now, I could
just trim back the coverage of these clusters since they're not frequent
in our current data sets, but it would be nice if the lrfsm could be
made zoomier in time.

By the way, that test for self-feeding of lexical rules is in general a
quite nice test for silly errors in the ordinary lexical rules - pointed
me to several infelicities in the existing ERG rule set.  I guess the
warning message might be made user-controllable, maybe when doing a
manual check of the lexicon?

  Dan