[developers] processing of lexical rules

Ann Copestake Ann.Copestake at cl.cam.ac.uk
Wed Feb 9 23:35:53 CET 2005

This is an attempt at a quick message rather than a detailed response to Emily
and oe (which I don't have time for right now).  I've said versions of the
following to various people at various times, but ...

a) obviously the Bernie Jones %suffix stuff is inadequate - I've been surprised
people have kept with it as long as they have.

b) we need an interface to the Xerox system (and alternatives should any arise)
because they've worked out how to handle the spelling aspects of morphology for
about 200 languages.  

c) the notion of lexical/morphological rule as equivalent to unary grammar rule
(with some spelling in the case of morphological rules) is very general and one
I am committed to keeping.  But we could easily support a word syntax
approach too.  i.e., if someone has a preprocessor that instantiates the chart
with morphemes then I can't see a serious reason why the existing machinery 
can't handle this.

d) The LKB should be using a chart throughout - we hope we've got funding that
will support Ben working on this (besides other things), like standardising
output from preprocessors.

e) There's an ISO working group on morphosyntactic annotation that has a draft
proposal in this area for an XML-based standard - I need to check with the
author whether this can be distributed but will send it round if I can.  At
least as far as interfacing with an external system goes, I think this is the
cleanest way to spec the interface (of course one might want to short-circuit
it for efficiency).

f) It's going to be sufficiently hard work to rewrite bits of grammars and to
write documentation for a reworking of the morphology that we really ought to
do it properly.  My preferred course of action would be to a) look at what the
ISO stuff allows and see if we can think of anything we need to do that it
doesn't cover, b) go through the languages we currently care about and see how
we might want them to work (I think this will involve a subset of the ISO
stuff), c) think about efficiency, d) trial an implementation in the LKB e)
talk to a few people like Kartunnen to see if there's obvious stuff we're
missing - I guess c, d and e are potentially parallel operations.

Incidentally, Kristina Jokkinen who is currently visiting Cambridge has some
interest in doing Finnish.  I was planning on talking to her to try and find
out what we might do about the morphology for that - I know, for instance, that
the standard two-level morphology approach involves multiple suffix lexicons.

I'd really like to wait for a few days to see if we can get final confirmation
of the funding for Ben before going into detailed plans.  I am definitely not
saying that Ben is going to fix all the problems with morphology but I think
there's sufficient interaction with what we hope he's going to be doing that
it'd be sensible to try and coordinate.


More information about the developers mailing list