[developers] boring though important: generalizing characterization

Wed Apr 11 14:46:06 CEST 2007

I'd like to suggest that the mechanism for referring to tokens 
(parser-internal) be kept separable from the externally valid cfrom/cto notion 
(or the general standoff version). They are different, both may be required in 
different circumstances, I think mixing them in a single representation is 
suboptimal.  What we need for the output (R)MRS, seen as the external 
interface, is the cfrom/cto since we don't want a user of such information to 
have to indirect via the token representation, which will be different for 
different parsers.  Making this distinction conceptually clear is very 
important and I don't want to mix the external cfrom/cto, which can be used 
when comparing RMRSs from different sources, with an internal-only token 
pointer.

a version of cfrom/cto will work with the standard speech lattice output, btw 
- it assumes a unique sequential labelling is available, but this is 
supported, as I understand it.  I agree a token labelling may be needed too, 
however.

we did present the standoff stuff at the last summit - I don't want to keep 
doing it!

Ann