[Fwd: Re: [developers] processing of lexical rules]

Ann Copestake Ann.Copestake at cl.cam.ac.uk
Thu Feb 17 19:23:31 CET 2005

crysmann at dfki.de said:
> Well, given that there are languages with subtractive morphology (see
> e.g. Anderson 1992), choice 1b is probably out. That choice would also
>  be problematic for fused elements in cluster morphology (e.g.
> clusters  of pronominal affixes), where you cannot easily assign a
> well-behaved  sign-like feature structure to the fused morphs.

subtractive morphology - I guess it rather depends what you think an affix is 
- I'm really using ithe term here to mean a necessarily bound morpheme.  I 
suppose you could do lexical lookup on a subtracted element.  It all gets very 
artificial at this point, of course.

The reason I went for a rule-based approach originally in the LKB was that I 
wanted to capture the similarity between zero-derivation and `real' derivation 
and I didn't want to have null affixes.  But if you have an approach to the 
morphophonology that allows an abstract morpheme, then I suspect there's 
really very little difference between the theoretical capabilities of the 1a 
approach and the 1b approach, if we can extend the 1a approach to the case 
where there are multiple stems.  After all, you can always define an FS for an 
affix that uniquely triggers a rule, or you can have a relatively vacuous rule 
that combines two morphemes.

> I had a bit of a problem here understanding as to how 2b differs from 
> 1a,  or 2a from 1b. Can you clarify this?

Choice 1 is about the feature structure level mechanisms.  Choice 2 is really 
about tokenisation.  You may think of this as an implementation detail, but if 
you think of basic definitions of grammars you always have some statement 
about words or strings or whatever which corresponds to assuming there is some 
given tokenisation.  What Choice 2 amounts to is deciding whether you treat 
morphophonology as taking as input one tokenisation and returning another 
(2a), potentially completely different one (which is then input to 
morphosyntax) or as taking as input one tokenisation and returning a (partial) 
derivation for each token which guides morphosyntax (2b).  The retokenisation 
(2a) approach is more powerful because it allows the morphosyntax to create 
structures which do not obey the partial bracketing imposed by the initial 
tokenisation (though there is a complication here because the MWE mechanism 
allows for multiple tokens to correspond to a single lexical structure).  
Choice 1 goes along with choice 2 in that I don't see that it makes sense to 
combine 1a (affixes as rules) with retokenisation (2a).  But if we have the 
derivation approach (2b), and we can extend it to handle compounding, then 
that the mechanism that we use for compounds would allow 1b (affixes as 
lexical items) rather than 1a (affixes as rules).  So I believe they are 
different choices.

crysmann at dfki.de said:
> Can you foresee a semantic solution to bracketing paradoxa? Having
> talked to Markus Egg, I remember that he thinks that most of these
> issues can be dealt with on that level.

well, yeah, if you allow an unconstrained approach to semantics ... to be 
honest, I don't yet know what I think about this in general.  For the 
`generative grammarian' example, I actually think it's an MWE (see other msg). 
 I don't think this about some of Markus's examples but I don't think the 
retokenisation idea helps with most of them.  There are more subtle cases than 
`generative grammarian', like the `+ed' affix which is found in things like 
`red roofed house' or `rattan chaired terrace' where *`chaired terrace' and 
`rattan chair' not a lexicalised MWE, but I think I've decided to save 
worrying about those until after I've retired.  Sitting on a wicker chaired 
terrace next to a red roofed cafe drinking a late bottled port and worrying 
about unalienable possession sounds like a good thing to do in retirement ...

crysmann at dfki.de said:
> Most certainly not, at least as far as derivation is concerned. Here
> is  an example from German:

> [[[halt]bar]keit]+s+[datum] `expiry date'

OK, I realised you get derived forms in compounds, the question was really 
could you live with such a restriction?

Fugenelemente and the actual compound splitting, I am afraid count as 
morphophonology and thus aren't my scene.  In doing a trial implementation, I 
intend to proceed as though there was a system out there which could 
hypothesise compound splitting.  I will probably be able to make the existing 
LKB string unification approach to morphophonology work to some extent with 
compounds, but someone else will have to do it properly.  We can discuss what 
hooks are needed so that such a resource can use the LKB lexicon - I am not 
saying this has to be completely external, just that I don't propose to 
implement it and/or maintain it.  By the way, Koehn and Knight (2003) have a 
method for German compounds based on the use of a bilingual corpus which they 
claim finds the right split with very high accuracy.  I haven't looked at this 
carefully but it seems like putting statistical techniques in early here is a 
plausible way of maintaining tractability.


More information about the developers mailing list