[developers] Extracting surface form of tokens from derivation trees

Thu Apr 24 21:34:28 CEST 2014

sweaglesw at sweaglesw.org said:
> he MRS CFROM/CTO come straight from the +FROM and +TO properties of the
> post-token-mapping tokens dominated by the edge each EP is introduced on.
> Unfortunately they do not uniquely identify such a token; for example:

> We admired the sky-blue water.

> This yields a 'sky-' token and a 'blue' token, both with identical +FROM and
> +TO, and correspondingly a _sky_n_1_rel EP and a _blue_a_1_rel EP, both with
> identical CFROM/CTO.   The span "sky-blue" is considered a single token
> before token-mapping (e.g. as the input to TNT), so the answer to your
> question (b) is yes.  I don't see that this is a problem from the point of
> view of (a), if what you want is a correspondence between EPs and TNT-level
> tokens, since the EPs still point to the input token spans in this case.

ok, so if we were generating some form of tfrom tto, we'd get 

_sky_n_1_rel 4:4  _blue_a_1_rel 4:4

and:

The water was sky-blue.

would yield:

_sky_n_1_rel 4:5  _blue_a_1_rel 4:5

I think this might imply some special treatment would be required, but I'm not 
sure.

Thanks,

Ann