[developers] top-level cfrom/cto values in xml always -1

Ann Copestake Ann.Copestake at cl.cam.ac.uk
Mon Oct 10 18:06:41 CEST 2005

> In my opinion, these position should always refer to positions in the
> original document, no matter what preprocessing units were allowed to
> add or delete stuff. 

yes, the only question is how do you establish/specify _which_ positions!  

We seem to have now established a consensus that we want to do something
different with the global cfrom/cto than was originally intended, so somebody
has to document that.  I was actually assuming that anyone who wanted the
global cfrom/cto would implement it by calculating max/min on the token derived
values, and as soon as you said you were passing the cfrom/cto values it became
clear that this was not the case.

Since we started on this - there are issues with the tokeniser - e.g., you have
a string `abc d ef' - the preprocessor tokenises to `ab' `x' - what is the
cfrom/cto for those tokens?  Have you got a specification for what you're
doing, and if so, could you show Ben, please?  


More information about the developers mailing list