[developers] processing of lexical rules

Ann Copestake Ann.Copestake at cl.cam.ac.uk
Sun Jun 5 20:20:45 CEST 2005


> 
> One other thing - the inventory of lexical rules in the ERG, when enriched
> with the ten or so punctuation variants, makes the building of the lrfsm
> very expensive, mostly because I have to allow for multiple occurrences of 
> some 'pairing' punctuation marks like parens and quote marks, as in
>     Kim said, "I met Sandy in Corvallis (that famous "city")".
> If I leave in the rules for single quote, double quote, and right paren
> as is, the lrfsm bulid can take several minutes; if I throw out one, the
> time comes down to thirty seconds; if two out, then about three seconds,
> and if none of these reapplying guys, then it's quick.  For now, I could
> just trim back the coverage of these clusters since they're not frequent
> in our current data sets, but it would be nice if the lrfsm could be
> made zoomier in time.

I have rewritten this - it should generally be faster, but it could still be
slow when there are rule cycles involving several rules.  Could you either test
it, or send me the version of the grammar with the punctuation rules?

One thing I noticed - the lrfsm and the rule-filter are getting built twice
with the version of the ERG script that I have because they are called by the
fn that reads in rules and this might be called multiple times.  I have moved
their computation to read-script-file-aux, since this is the least bad option I
can think of.

> 
> By the way, that test for self-feeding of lexical rules is in general a
> quite nice test for silly errors in the ordinary lexical rules - pointed
> me to several infelicities in the existing ERG rule set.  I guess the
> warning message might be made user-controllable, maybe when doing a
> manual check of the lexicon?

I don't understand this comment - why would you warn about lexical rule cycles
when doing a manual check of the lexicon?  Is this a polite way of saying that
you don't think all the cases where a rule can feed itself are errors and that
therefore they shouldn't be signalled when a grammar is loaded?  I agree they
aren't necessarily errors but most of the time they are, especially with novice
grammar writers.  It seems correct to me to warn about all such cases with the
intention being that you can have cyclic rules, but only if you know what
you're doing!  I've fixed the code so each warning only shows up once, though
...  If you'd prefer an alternative terminology to `Warning', let me know!

Ann





More information about the developers mailing list