[developers] preprocessing functionality in pet

Ann Copestake Ann.Copestake at cl.cam.ac.uk
Thu Feb 8 18:55:29 CET 2007



> > My belief about case is that, in the long-term, the systems should not
> > be normalising case, except as defined by a grammar-specific
> > preprocessor.  Wasn't this a conclusion from Jerez?  I still intend to
> > take all the case conversion out of the LKB.
> 
> OK with me. Still, Berthold wants a "super robust" mode, where as well
> input as lexicon access is normalized. Otherwise, he (and pet) has to
> provide functionality for, e.g., sentence initial capitals. And this
> again raises the question of preprocessor formalism/implementation.
> 

I saw Tim's message but not Berthold's, I think.  I see the point re
the superrobust mode and agree we should feine it.  I think the right
way to deal with sentence initial capitalisation (for a grammar which
treats case as significant) is just for the preprocessor to produce
two inputs - capitalised and non-capitalised.

And, yes, there is an issue about the interaction of the morphology
with the external named entity components.  I keep forgetting that the
treatment of punctuation as affixes complicates this in that it means
that these at least have to be applied, but we don't necessarily want
the `ordinary' rules to be applied.  There are a number of ways of
handling this, and I can enumerate the LKB options, but I don't
remember where we'd got to.  Can anyone fill in?

> 
> Finally, this wasn't meant as a call to weapons. I just would like
> those things to be settled. 

I am really sorry if it sounded as though I was taking it that way.  I
completely agree with you that we ought to sort all this out.  

Ann



More information about the developers mailing list