[developers] top-level cfrom/cto values in xml always -1

Stephan Oepen oe at csli.Stanford.EDU
Mon Oct 10 14:24:07 CEST 2005


> One thing I need to know though is where the global cfrom/cto values
> come from in PET and more importantly, what they are taken to mean.
> When I agreed to Francis's (I think) suggestion to put cfrom and cto
> at the top level of the RMRS as well as on the individual relations,
> there was an assumption that these were equivalent to something that
> could be extracted from the RMRS (i.e., cfrom was the min of the
> cfroms while cto was the max of the ctos).  I assume from what's
> discussed here that they are not part of the FS in PET, but what
> exactly is the semantics?  e.g., if the input string has a space at
> the end, which is stripped by the tokeniser, does the space end up
> counting for the global cto value or not?

personally, i would assume character positions to refer to the original
string, i.e. the string handed in for analysis, prior to any processing
(with sentence boundary detection as one exception, maybe).  thus, even
the trailing space character in your example would in principle have to
be considered visible.  so, if one were to see utility in global CFROM
and CTO values, that space (or maybe trailing punctuation) could result
in the global CTO exceeding the maximum of EP-level CTO values.  if one
were so inclined, this could even be construed as an argument for CFROM
and CTO on the (R)MRS itself, so as to encode which segment of input is
accounted for by this chunk of semantics.  thinking of a semantically
vacuous first (or last) token makes this potential mismatch a lot more
plausible for me (thus, global CFROM and CTO may have value after all).

regarding code organization:

> All of these will potentially have to be individually updated to do
> the `global' cfrom/cto.

i think for this specific problem, a good solution would be to hand the
.cfrom. and .cto. parameters down into mrs-to-rmrs() (or possibly even
extract-mrs-from-fs()).  but more generally, i would be very happy for
you and bernd to work together and tidy up some of the historic debris
around these interfaces.

                                   good night now (from korea)  -  oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (ILN); Boks 1102 Blindern; 0317 Oslo; (+47) 2285 7989
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at csli.stanford.edu; oe at hf.uio.no; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



More information about the developers mailing list