<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, "EmojiFont", "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;" dir="ltr">
<p style="margin-top:0;margin-bottom:0">Hi Jan,</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0">We do need to update the Redwoods page for the information on the Ninth Growth. In the meantime, you can find brief descriptions of several of the larger components of the treebank in the appendix in a chapter I contributed
to the 2010 festschrift for Tom Wasow (attached). Those components include Verbmobil (vm*), E-commerce (ec*), LOGON (jh*, ps*, tg*, rondane, hike), SemCor (sc*), Wikipedia (ws01-13, ws214), and an Eric Raymond essay (cb). The main addition to the Ninth Growth
over previous versions is the DeepBank annotation of the Wall Street Journal, sections 00-21, (the same text annotated in the Penn Tree Bank), described in the proceedings of TLT 2012. You will also see two profiles (rtc000, rtc001) from the Tanaka (Pacling
2001) corpus, two profiles of user-generated web content studied in the WeScience project (moin.delph-in.net/WeScience), and several profiles also from the Brown SemCor that were treebanked after the ERG 1214 release was frozen (cf*, cg*, ck*, cl*, cm*, cn*,
cp*, cr*), described in Oepen et al for SemEval 2015. In addition, the Ninth Growth includes several smaller linguist-constructed test suites (csli, esd, fracas, mrs, and trec), for which I can send you descriptions if you're interested.
<br>
</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0">Best,</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0"> Dan<br>
</p>
<br>
<div style="color: rgb(0, 0, 0);">
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> developers-bounces@emmtee.net <developers-bounces@emmtee.net> on behalf of Jan Buys <jbuys@cs.washington.edu><br>
<b>Sent:</b> Monday, May 7, 2018 11:56 AM<br>
<b>To:</b> developers<br>
<b>Subject:</b> [developers] Redwoods info</font>
<div> </div>
</div>
<meta content="text/html; charset=utf-8">
<div>
<div dir="ltr">
<div>Dear developers,</div>
<div><br>
</div>
I am looking for a list of all the corpora <span style="color:rgb(34,34,34); font-family:arial,sans-serif; font-size:small; font-style:normal; font-weight:400; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; background-color:rgb(255,255,255); float:none; display:inline">included
in Redwoods (Ninth Growth) </span>and their corresponding domains (preferably with a description of the origin of the texts). Apart from the spreadsheet which just lists all the (abbreviated) section names and their statistics, is there any more detailed description
available?
<div>
<div><br>
</div>
<div>
<div>
<div>Regards,</div>
<div>Jan</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>