[developers] Adjusting LNK values to space-delimited tokens

Michael Wayne Goodman goodmami at uw.edu
Mon Jun 26 00:14:01 CEST 2017

Hi all,

A colleague of mine is attempting to use ERG semantic outputs in a system
originally created for another representation, and his system requires the
semantics to be paired with a tokenized string (e.g., with punctuation
separated from the word tokens).

I can get the space-delimited tokenized string, e.g., from repp or from ACE
with the -E option, but then the CFROM/CTO values in the MRS no longer
align to the string. The initial tokens ('p-input' in the 'parse' table of
a [incr tsdb()] profile) can tell me the span of individual tokens in the
original string, which I could use to compute the adjusted spans. This
seems simple enough, but then it gets complicated as there are separated
tokens that should still count as a single range (e.g. "could n't", where
'_can_v_modal' and 'neg' both select the full span of "could n't") and also
those I want separated, like punctuation (but not all punctuation, like '
in "The kids' toys are in the closet.").

Has anyone else thought about this problem and can share some solutions?
Or, even better, code to realign EPs to the tokenized string?

Michael Wayne Goodman
Ph.D. Candidate, UW Linguistics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20170625/414423e7/attachment.html>

More information about the developers mailing list