<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Matic's thesis indeed has an approach to the version of the
problem he had to deal with (not quite the same), and he will make
code available. The thesis will be generally available once he's
done some corrections. But - he's now working in a company so
won't be supporting the code, and it was anyway far from perfect.</p>
<p>Is the system you're trying to integrate with really simply
space-tokenized? People generally use something a little more
complex.<br>
</p>
<p>All best,<br>
</p>
<br>
Ann<br>
<br>
<div class="moz-cite-prefix">On 26/06/2017 05:54, Francis Bond
wrote:<br>
</div>
<blockquote
cite="mid:CA+arSXi04aMbHBeLwmbfue-xCusJY2rWWqY9XkuQBUYVuv1Vag@mail.gmail.com"
type="cite">
<div dir="ltr">I am pretty sure Matic has done some work on this
problem, ...</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon, Jun 26, 2017 at 6:50 AM,
Michael Wayne Goodman <span dir="ltr"><<a
moz-do-not-send="true" href="mailto:goodmami@uw.edu"
target="_blank">goodmami@uw.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Thanks Woodley,
<div class="gmail_extra"><br>
<div class="gmail_quote"><span class="">On Sun, Jun 25,
2017 at 8:03 PM, Woodley Packard <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:sweaglesw@sweaglesw.org"
target="_blank">sweaglesw@sweaglesw.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Have you
considered passing a pre-tokenized string
(produced by REPP or otherwise) into ACE?
Character spans will then automatically be
produced relative to that string. Or maybe I
misunderstood your goal?</blockquote>
<div><br>
</div>
</span>
<div>Yes, I have tried this, but (a) I still get
things like the final period being in the same span
as the final word (now with the additional space);
(b) I'm concerned about *over*-tokenization, if the
REPP rules find something in the tokenized string to
further split up; and (c) while it was able to parse
"The dog could n't bark .", it fails to parse things
like "The kids ' toys are in the closet .".</div>
<div><br>
</div>
<div>As to my goal, consider again "The dog couldn't
bark." The initial (post-REPP) tokens are:</div>
<div><br>
</div>
<div>
<div style="font-size:12.8px"> <0:3>
"The"</div>
<div style="font-size:12.8px"> <4:7>
"dog"</div>
<div style="font-size:12.8px"> <8:13>
"could"</div>
<div style="font-size:12.8px"> <13:16>
"n’t"</div>
<div style="font-size:12.8px"> <17:21>
"bark"</div>
<div style="font-size:12.8px"> <21:22>
"."</div>
</div>
<div style="font-size:12.8px"><br>
</div>
<div style="font-size:12.8px">The internal tokens are:</div>
<div style="font-size:12.8px"><br>
</div>
<div style="font-size:12.8px">
<div style="font-size:12.8px"> <0:3>
"the"</div>
<div style="font-size:12.8px"> <4:7>
"dog"</div>
<div style="font-size:12.8px"> <8:16>
"couldn’t"</div>
<div style="font-size:12.8px"> <17:22>
"bark."</div>
<div><br>
</div>
</div>
<div>I would like to adjust the latter values to fit
the string where the initial tokens are all space
separated. So the new string is "The dog could n't
bark .", and the LNK values would be:</div>
<div><br>
</div>
<div style="font-size:12.8px"> <0:3>
_the_q</div>
<div style="font-size:12.8px"> <4:7>
_dog_n_1</div>
<div style="font-size:12.8px"> <8:17>
_can_v_modal, neg (CTO + 1 from the internal space)</div>
<div style="font-size:12.8px"> <18:22>
_bark_v_1 (CFROM + 1 from previous adjustment; CTO
- 1 to get rid of the final period)</div>
<div><br>
</div>
<div>My colleague uses these to anonymize named
entities, numbers, etc., and for this task he says
he can be somewhat flexible. But he also uses them
for an attention layer in his neural setup, in which
case he'd need exact alignments.</div>
<span class="">
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"><span
class="m_-6852063119269118278gmail-m_3482906361254902959HOEnZb"><font
color="#888888"><br>
Woodley<br>
</font></span>
<div
class="m_-6852063119269118278gmail-m_3482906361254902959HOEnZb">
<div
class="m_-6852063119269118278gmail-m_3482906361254902959h5"><br>
<br>
<br>
<br>
> On Jun 25, 2017, at 3:14 PM, Michael
Wayne Goodman <<a moz-do-not-send="true"
href="mailto:goodmami@uw.edu"
target="_blank">goodmami@uw.edu</a>>
wrote:<br>
><br>
> Hi all,<br>
><br>
> A colleague of mine is attempting to use
ERG semantic outputs in a system originally
created for another representation, and his
system requires the semantics to be paired
with a tokenized string (e.g., with
punctuation separated from the word tokens).<br>
><br>
> I can get the space-delimited tokenized
string, e.g., from repp or from ACE with the
-E option, but then the CFROM/CTO values in
the MRS no longer align to the string. The
initial tokens ('p-input' in the 'parse' table
of a [incr tsdb()] profile) can tell me the
span of individual tokens in the original
string, which I could use to compute the
adjusted spans. This seems simple enough, but
then it gets complicated as there are
separated tokens that should still count as a
single range (e.g. "could n't", where
'_can_v_modal' and 'neg' both select the full
span of "could n't") and also those I want
separated, like punctuation (but not all
punctuation, like ' in "The kids' toys are in
the closet.").<br>
><br>
> Has anyone else thought about this
problem and can share some solutions? Or, even
better, code to realign EPs to the tokenized
string?<br>
><br>
> --<br>
> Michael Wayne Goodman<br>
> Ph.D. Candidate, UW Linguistics<br>
</div>
</div>
</blockquote>
</span></div>
<span class=""><br>
<br clear="all">
<div><br>
</div>
-- <br>
<div
class="m_-6852063119269118278gmail-m_3482906361254902959gmail_signature">
<div dir="ltr">Michael Wayne Goodman
<div>Ph.D. Candidate, UW Linguistics</div>
</div>
</div>
</span></div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div class="gmail_signature" data-smartmail="gmail_signature">Francis
Bond <<a moz-do-not-send="true"
href="http://www3.ntu.edu.sg/home/fcbond/" target="_blank">http://www3.ntu.edu.sg/home/fcbond/</a>><br>
Division of Linguistics and Multilingual Studies<br>
Nanyang Technological University<br>
</div>
</div>
<div id="DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2"><br>
<table style="border-top: 1px solid #D3D4DE;">
<tbody>
<tr>
<td style="width: 55px; padding-top: 13px;"><a
moz-do-not-send="true"
href="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient"
target="_blank"><img moz-do-not-send="true"
src="https://ipmcdn.avast.com/images/icons/icon-envelope-tick-green-avg-v1.png"
alt="" style="width: 46px; height: 29px;"
height="29" width="46"></a></td>
<td style="width: 470px; padding-top: 12px; color:
#41424e; font-size: 13px; font-family: Arial, Helvetica,
sans-serif; line-height: 18px;">Virus-free. <a
moz-do-not-send="true"
href="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient"
target="_blank" style="color: #4453ea;">www.avg.com</a>
</td>
</tr>
</tbody>
</table>
<a moz-do-not-send="true"
href="#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2" width="1"
height="1"> </a></div>
</blockquote>
<br>
</body>
</html>