<div dir="ltr"><div>Thank you! Processing so much faster now :)<br><br><br></div>Megan<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, May 26, 2013 at 11:17 PM, Woodley Packard <span dir="ltr"><<a href="mailto:sweaglesw@sweaglesw.org" target="_blank">sweaglesw@sweaglesw.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">You could `cat' a file with one sentence per line into that same command, e.g.:<br>
<br>
$ cat test.txt<br>
"Squeak!" said the mouse.<br>
The dog said, "Woof."<br>
$ cat test.txt | ./logon/bin/cheap -t -repp -preprocess-only=yy logon/lingo/erg/english<br>
[.....]<br>
(1, 0, 1, <0:1>, 1, "", 0, "null")<br>
(2, 1, 2, <1:7>, 1, "Squeak", 0, "null")<br>
(3, 2, 3, <7:8>, 1, "!", 0, "null")<br>
(4, 3, 4, <8:9>, 1, "", 0, "null")<br>
(5, 4, 5, <10:14>, 1, "said", 0, "null")<br>
(6, 5, 6, <15:18>, 1, "the", 0, "null")<br>
(7, 6, 7, <19:24>, 1, "mouse", 0, "null")<br>
(8, 7, 8, <24:25>, 1, ".", 0, "null")<br>
(9, 0, 1, <0:3>, 1, "The", 0, "null")<br>
(10, 1, 2, <4:7>, 1, "dog", 0, "null")<br>
(11, 2, 3, <8:12>, 1, "said", 0, "null")<br>
(12, 3, 4, <12:13>, 1, ",", 0, "null")<br>
(13, 4, 5, <14:15>, 1, "", 0, "null")<br>
(14, 5, 6, <15:19>, 1, "Woof", 0, "null")<br>
(15, 6, 7, <19:20>, 1, ".", 0, "null")<br>
(16, 7, 8, <20:21>, 1, "", 0, "null")<br>
<br>
I guess you can separate the sentences by seeing when the "from" vertex identifier resets to 0.<br>
<br>
For an entirely different approach, you could try the -Ev options with ACE. The output contains the same data, but it is printed in a different format:<br>
<br>
$ cat test.txt | ~/cdev/ace/ace -g ~/cdev/ace/erg.dat -Ev 2>/dev/null | grep -v '^NOTE'<br>
<0:1> Squeak<1:7> !<7:8> <8:9> said<10:14> the<15:18> mouse<19:24> .<24:25><br>
<br>
<br>
The<0:3> dog<4:7> said<8:12> ,<12:13> <14:15> Woof<15:19> .<19:20> <20:21><br>
<br>
<br>
Good luck,<br>
Woodley<br>
<div class="HOEnZb"><div class="h5"><br>
On May 26, 2013, at 11:04 PM, Megan Schneider wrote:<br>
<br>
> Does anyone know of a good way to get bulk REPP tokenization for a set of sentences? The one-by-one method appears to be:<br>
><br>
> echo <sentence> | ./logon/bin/cheap -t -repp -preprocess-only=yy ./logon/lingo/erg/english<br>
><br>
> Is there a good way to do this without needing to reload the rules/types every sentence? Not looking for a functional difference, just an efficiency difference.<br>
><br>
><br>
> Thanks!<br>
> Megan<br>
<br>
</div></div></blockquote></div><br></div>