[developers] preprocessing functionality in pet

Thu Feb 8 21:48:59 CET 2007

On Thu, 2007-02-08 at 18:10 +0100, Bernd Kiefer wrote:
> > maybe instead of spending the time having a lively discussion, we
> > should stat off by all just sitting down and writing documentation?
> > Our discussions tend to end with the decision that we need
> > documentation and then it doesn't happen.  
> 
> I agree that this is a good and true point, and i won't exclude myself.
> 
> > I'm somewhat guilty of this
> > with the morphology stuff, I admit, although I have emailed a fairly
> > detailed account.  I'd welcome specific questions if people don't
> > understand it.  I didn't think there was an urgent need to document
> > the details of how the morphological rule filter is applied, though,
> > because the behaviour is declarative and gives the same results as if
> > the filter were not there (except much faster).  There is an unusually
> > high amount of comments in that part of the LKB code btw. 
> 
> I think the morphology stuff itself is quite settled, sorry i didn't
> make this clearer. Responsibility for the better filter is with me, no
> doubt. 
> 
> Not clear, however, is for example if this processing should be applied
> to, for example, generic entries (resp. their surface form) or similar
> things coming from the input chart where a (lexical) type is supplied
> in addition to the surface form (see the mail of Tim and the request by
> Berthold).
> 

since I only talked about that to Bernd in person, let me briefly
summarise: what I want is to have pos-mapped entries that obligatorily
inflect (e.g., attributive adjectives in German) undergo morphological
rules. Since there is only a single paradigm, ambiguity i not an issue.
For nouns, e.g., I would not want to do it, but I am sure   that this
can be easily done in the grammar. So, what is necessary, is to apply
inflection rules to the surface and force replay even if a stem form
could not be retrieved from the lexicon. I  cases where this
functionality is not necessary, generic le's can be either classified as
already inflected, or else assigned to a special class which undergoes
zero derivation.

> > My belief about case is that, in the long-term, the systems should not
> > be normalising case, except as defined by a grammar-specific
> > preprocessor.  Wasn't this a conclusion from Jerez?  I still intend to
> > take all the case conversion out of the LKB.
> 
> OK with me. Still, Berthold wants a "super robust" mode, where as well
> input as lexicon access is normalized. Otherwise, he (and pet) has to
> provide functionality for, e.g., sentence initial capitals. And this
> again raises the question of preprocessor formalism/implementation.
> 

I am not sure anymore that this isn't what Pet already does, the only
problem being initial umlauts. I have so far only experienced the
problem with words such as Ägypten, although the whole GG noun lexicon
is caps-initial.

So, the issue is probably related to the bug we experienced earlier with
surface "Über" not mapping onto lexical "über" which has been fixed by
Bernd some weeks ago.

Cheers,

B

> > I would like to know where/why the ECL preprocessor is so slow - I
> > hadn't heard this.  Is it because it's writing out a full PET input
> > chart or something?  I would be surprised if we couldn't make the
> > speed acceptable in Lisp unless ECL itself is very inefficient, but
> > then the MRS stuff runs reasonably, doesn't it?
> 
> Seems this is something with the fspp library and ECL. At least i had
> the impression when i tried it last time (which is some time
> ago). Maybe this has improved?
> 
> Finally, this wasn't meant as a call to weapons. I just would like
> those things to be settled. 
> 
> Best,
>         Bernd
>