[developers] Re: Morphological ambiguity and xfst-lkb interface

Ann Copestake Ann.Copestake at cl.cam.ac.uk
Mon Apr 11 18:42:39 CEST 2005

We have to standardise :cfrom and :cto.  They can't be tokens because
tokenisers differ.  There isn't a quick solution that can be fully general.  I
think the `right' way of doing it is to develop an XML encoding and a notion of
offset into the relevant parts of the PCDATA etc that we can apply
consistently.  xpointer may be useful here but someone needs to talk to the XML
guys about what they are planning.  The best temporary solution is probably to
use byte offsets.  I am expecting that Ben will come up with a proposal for
this - he is now funded (by Boeing) to work on these issues.  The funding has
only just been finalised so we don't have a workplan yet, but I'm assuming that
once he's had a chance to look at stuff he will circulate some initial
proposals (probably best via the developers list).  We have to make this work
for RASP as well as various DELPH-IN software.


