benjamin.waldron at cl.cam.ac.uk
Wed Oct 12 15:23:41 CEST 2005
The currently agreed semantics of CFROM/CTO is that they refer to
character positions. Eg. given the text 'abcd' the range CFROM=0 to
CTO=2 refers to the "abc" substring.
0123 = character positions
I would like to suggest we use character _points_ (the points between
characters) instead of the above -- more expressive and allows the
specification of empty ranges. Eg. given the text 'abcd' the range
CFROM=0 to CTO=2 would refer to the "ab" substring, whilst the range
CFROM=0 to CTO=3 would refer to the "abc" substring
0 1 2 3 4 = character points
What would people feel about such a semantics? The conversion from
character positions to character points is simple: CFROM values are the
same, to convert a CTO character position to character point you must add 1.
More information about the developers