[Fwd: Re: [developers] processing of lexical rules]
Emily M. Bender
ebender at u.washington.edu
Thu Feb 17 22:20:33 CET 2005
On Thu, Feb 17, 2005 at 06:23:31PM +0000, Ann Copestake wrote:
> I don't think this about some of Markus's examples but I don't
> think the retokenisation idea helps with most of them. There are
> more subtle cases than `generative grammarian', like the `+ed' affix
> which is found in things like `red roofed house' or `rattan chaired
> terrace' where *`chaired terrace' and `rattan chair' not a
> lexicalised MWE, but I think I've decided to save worrying about
> those until after I've retired. Sitting on a wicker chaired terrace
> next to a red roofed cafe drinking a late bottled port and worrying
> about unalienable possession sounds like a good thing to do in
> retirement ...
:-)
> Fugenelemente and the actual compound splitting, I am afraid count
> as morphophonology and thus aren't my scene. In doing a trial
> implementation, I intend to proceed as though there was a system out
> there which could hypothesise compound splitting. I will probably
> be able to make the existing LKB string unification approach to
> morphophonology work to some extent with compounds, but someone else
> will have to do it properly. We can discuss what hooks are needed
> so that such a resource can use the LKB lexicon - I am not saying
> this has to be completely external, just that I don't propose to
> implement it and/or maintain it.
Jeff and I are currently working on stuff closely related to this:
we're presenting on the morphophonology/morphosyntax interface stuff
again at CLS and would like to get some implementation ready. We're
primarily concentrating on how one could go from a database of stems
(with additional information associated with each stem concerning to
morphotactics and morphologically-conditioned phonological rules) to
(in the first instance) the lexc and xfst files required to build
an xfst morphological analyzer. Ideally, we'd want the morphophonological
database and the morphosyntactic database (= current LKB lexical
DB) to be one entity, with the possibility of many-to-many mappings
between morphophonological stems and morphosyntactic lexical entries.
Emily
More information about the developers
mailing list