[developers] preprocessing functionality in pet
Ann Copestake
Ann.Copestake at cl.cam.ac.uk
Thu Feb 8 18:55:29 CET 2007
> > My belief about case is that, in the long-term, the systems should not
> > be normalising case, except as defined by a grammar-specific
> > preprocessor. Wasn't this a conclusion from Jerez? I still intend to
> > take all the case conversion out of the LKB.
>
> OK with me. Still, Berthold wants a "super robust" mode, where as well
> input as lexicon access is normalized. Otherwise, he (and pet) has to
> provide functionality for, e.g., sentence initial capitals. And this
> again raises the question of preprocessor formalism/implementation.
>
I saw Tim's message but not Berthold's, I think. I see the point re
the superrobust mode and agree we should feine it. I think the right
way to deal with sentence initial capitalisation (for a grammar which
treats case as significant) is just for the preprocessor to produce
two inputs - capitalised and non-capitalised.
And, yes, there is an issue about the interaction of the morphology
with the external named entity components. I keep forgetting that the
treatment of punctuation as affixes complicates this in that it means
that these at least have to be applied, but we don't necessarily want
the `ordinary' rules to be applied. There are a number of ways of
handling this, and I can enumerate the LKB options, but I don't
remember where we'd got to. Can anyone fill in?
>
> Finally, this wasn't meant as a call to weapons. I just would like
> those things to be settled.
I am really sorry if it sounded as though I was taking it that way. I
completely agree with you that we ought to sort all this out.
Ann
More information about the developers
mailing list