[developers] Redwoods info

Dan Flickinger danf at stanford.edu
Mon May 14 14:57:36 CEST 2018


Hi Jan,


We do need to update the Redwoods page for the information on the Ninth Growth.  In the meantime, you can find brief descriptions of several of the larger components of the treebank in the appendix in a chapter I contributed to the 2010 festschrift for Tom Wasow (attached).  Those components include Verbmobil (vm*), E-commerce (ec*), LOGON (jh*, ps*, tg*, rondane, hike), SemCor (sc*), Wikipedia (ws01-13, ws214), and an Eric Raymond essay (cb).  The main addition to the Ninth Growth over previous versions is the DeepBank annotation of the Wall Street Journal, sections 00-21, (the same text annotated in the Penn Tree Bank), described in the proceedings of TLT 2012.  You will also see two profiles (rtc000, rtc001) from the Tanaka (Pacling 2001) corpus, two profiles of user-generated web content studied in the WeScience project (moin.delph-in.net/WeScience), and several profiles also from the Brown SemCor that were treebanked after the ERG 1214 release was frozen (cf*, cg*, ck*, cl*, cm*, cn*, cp*, cr*), described in Oepen et al for SemEval 2015.  In addition, the Ninth Growth includes several smaller linguist-constructed test suites (csli, esd, fracas, mrs, and trec), for which I can send you descriptions if you're interested.


Best,


 Dan

________________________________
From: developers-bounces at emmtee.net <developers-bounces at emmtee.net> on behalf of Jan Buys <jbuys at cs.washington.edu>
Sent: Monday, May 7, 2018 11:56 AM
To: developers
Subject: [developers] Redwoods info

Dear developers,

I am looking for a list of all the corpora included in Redwoods (Ninth Growth) and their corresponding domains (preferably with a description of the origin of the texts). Apart from the spreadsheet which just lists all the (abbreviated) section names and their statistics, is there any more detailed description available?

Regards,
Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20180514/ffe9ae3d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: appendix2010.pdf
Type: application/pdf
Size: 276699 bytes
Desc: appendix2010.pdf
URL: <http://lists.delph-in.net/archives/developers/attachments/20180514/ffe9ae3d/attachment-0001.pdf>


More information about the developers mailing list