[developers] preprocessing functionality in pet
crysmann at dfki.de
Thu Feb 8 21:48:59 CET 2007
On Thu, 2007-02-08 at 18:10 +0100, Bernd Kiefer wrote:
> > maybe instead of spending the time having a lively discussion, we
> > should stat off by all just sitting down and writing documentation?
> > Our discussions tend to end with the decision that we need
> > documentation and then it doesn't happen.
> I agree that this is a good and true point, and i won't exclude myself.
> > I'm somewhat guilty of this
> > with the morphology stuff, I admit, although I have emailed a fairly
> > detailed account. I'd welcome specific questions if people don't
> > understand it. I didn't think there was an urgent need to document
> > the details of how the morphological rule filter is applied, though,
> > because the behaviour is declarative and gives the same results as if
> > the filter were not there (except much faster). There is an unusually
> > high amount of comments in that part of the LKB code btw.
> I think the morphology stuff itself is quite settled, sorry i didn't
> make this clearer. Responsibility for the better filter is with me, no
> Not clear, however, is for example if this processing should be applied
> to, for example, generic entries (resp. their surface form) or similar
> things coming from the input chart where a (lexical) type is supplied
> in addition to the surface form (see the mail of Tim and the request by
since I only talked about that to Bernd in person, let me briefly
summarise: what I want is to have pos-mapped entries that obligatorily
inflect (e.g., attributive adjectives in German) undergo morphological
rules. Since there is only a single paradigm, ambiguity i not an issue.
For nouns, e.g., I would not want to do it, but I am sure that this
can be easily done in the grammar. So, what is necessary, is to apply
inflection rules to the surface and force replay even if a stem form
could not be retrieved from the lexicon. I cases where this
functionality is not necessary, generic le's can be either classified as
already inflected, or else assigned to a special class which undergoes
> > My belief about case is that, in the long-term, the systems should not
> > be normalising case, except as defined by a grammar-specific
> > preprocessor. Wasn't this a conclusion from Jerez? I still intend to
> > take all the case conversion out of the LKB.
> OK with me. Still, Berthold wants a "super robust" mode, where as well
> input as lexicon access is normalized. Otherwise, he (and pet) has to
> provide functionality for, e.g., sentence initial capitals. And this
> again raises the question of preprocessor formalism/implementation.
I am not sure anymore that this isn't what Pet already does, the only
problem being initial umlauts. I have so far only experienced the
problem with words such as Ägypten, although the whole GG noun lexicon
So, the issue is probably related to the bug we experienced earlier with
surface "Über" not mapping onto lexical "über" which has been fixed by
Bernd some weeks ago.
> > I would like to know where/why the ECL preprocessor is so slow - I
> > hadn't heard this. Is it because it's writing out a full PET input
> > chart or something? I would be surprised if we couldn't make the
> > speed acceptable in Lisp unless ECL itself is very inefficient, but
> > then the MRS stuff runs reasonably, doesn't it?
> Seems this is something with the fspp library and ECL. At least i had
> the impression when i tried it last time (which is some time
> ago). Maybe this has improved?
> Finally, this wasn't meant as a call to weapons. I just would like
> those things to be settled.
More information about the developers