[Fwd: Re: [developers] processing of lexical rules]
Ann Copestake
Ann.Copestake at cl.cam.ac.uk
Thu Feb 17 19:23:31 CET 2005
crysmann at dfki.de said:
> Well, given that there are languages with subtractive morphology (see
> e.g. Anderson 1992), choice 1b is probably out. That choice would also
> be problematic for fused elements in cluster morphology (e.g.
> clusters of pronominal affixes), where you cannot easily assign a
> well-behaved sign-like feature structure to the fused morphs.
subtractive morphology - I guess it rather depends what you think an affix is
- I'm really using ithe term here to mean a necessarily bound morpheme. I
suppose you could do lexical lookup on a subtracted element. It all gets very
artificial at this point, of course.
The reason I went for a rule-based approach originally in the LKB was that I
wanted to capture the similarity between zero-derivation and `real' derivation
and I didn't want to have null affixes. But if you have an approach to the
morphophonology that allows an abstract morpheme, then I suspect there's
really very little difference between the theoretical capabilities of the 1a
approach and the 1b approach, if we can extend the 1a approach to the case
where there are multiple stems. After all, you can always define an FS for an
affix that uniquely triggers a rule, or you can have a relatively vacuous rule
that combines two morphemes.
> I had a bit of a problem here understanding as to how 2b differs from
> 1a, or 2a from 1b. Can you clarify this?
Choice 1 is about the feature structure level mechanisms. Choice 2 is really
about tokenisation. You may think of this as an implementation detail, but if
you think of basic definitions of grammars you always have some statement
about words or strings or whatever which corresponds to assuming there is some
given tokenisation. What Choice 2 amounts to is deciding whether you treat
morphophonology as taking as input one tokenisation and returning another
(2a), potentially completely different one (which is then input to
morphosyntax) or as taking as input one tokenisation and returning a (partial)
derivation for each token which guides morphosyntax (2b). The retokenisation
(2a) approach is more powerful because it allows the morphosyntax to create
structures which do not obey the partial bracketing imposed by the initial
tokenisation (though there is a complication here because the MWE mechanism
allows for multiple tokens to correspond to a single lexical structure).
Choice 1 goes along with choice 2 in that I don't see that it makes sense to
combine 1a (affixes as rules) with retokenisation (2a). But if we have the
derivation approach (2b), and we can extend it to handle compounding, then
that the mechanism that we use for compounds would allow 1b (affixes as
lexical items) rather than 1a (affixes as rules). So I believe they are
different choices.
crysmann at dfki.de said:
> Can you foresee a semantic solution to bracketing paradoxa? Having
> talked to Markus Egg, I remember that he thinks that most of these
> issues can be dealt with on that level.
well, yeah, if you allow an unconstrained approach to semantics ... to be
honest, I don't yet know what I think about this in general. For the
`generative grammarian' example, I actually think it's an MWE (see other msg).
I don't think this about some of Markus's examples but I don't think the
retokenisation idea helps with most of them. There are more subtle cases than
`generative grammarian', like the `+ed' affix which is found in things like
`red roofed house' or `rattan chaired terrace' where *`chaired terrace' and
`rattan chair' not a lexicalised MWE, but I think I've decided to save
worrying about those until after I've retired. Sitting on a wicker chaired
terrace next to a red roofed cafe drinking a late bottled port and worrying
about unalienable possession sounds like a good thing to do in retirement ...
crysmann at dfki.de said:
> Most certainly not, at least as far as derivation is concerned. Here
> is an example from German:
> [[[halt]bar]keit]+s+[datum] `expiry date'
OK, I realised you get derived forms in compounds, the question was really
could you live with such a restriction?
Fugenelemente and the actual compound splitting, I am afraid count as
morphophonology and thus aren't my scene. In doing a trial implementation, I
intend to proceed as though there was a system out there which could
hypothesise compound splitting. I will probably be able to make the existing
LKB string unification approach to morphophonology work to some extent with
compounds, but someone else will have to do it properly. We can discuss what
hooks are needed so that such a resource can use the LKB lexicon - I am not
saying this has to be completely external, just that I don't propose to
implement it and/or maintain it. By the way, Koehn and Knight (2003) have a
method for German compounds based on the use of a bilingual corpus which they
claim finds the right split with very high accuracy. I haven't looked at this
carefully but it seems like putting statistical techniques in early here is a
plausible way of maintaining tractability.
Ann
More information about the developers
mailing list