[developers] DeepBank train-test split?

Francis Bond bond at ieee.org
Sat Apr 29 01:40:18 CEST 2017


G'day,

revisiting this, as more and more people are ready to use our data:
>> we propose development (orange) and test (yellow) sections for the

In the excel file, we also have red and green (for the file linked from the
wiki)
http://svn.delph-in.net/erg/tags/1212/etc/redwoods.xls

E.g. for 1212 wsj21 is green (in column A) and red (in column B)

Does anyone know what these mean?

Also, can we suggest another column with corpus name?   Mike and I would be
happy to add what we know (most of them), but at the moment it is a little
opaque for most people.  In particular the summary uses different names to
the profiles (how should someone know that cf10 is e.g. Brown?)

Finally, it would be nice to have a legend [explaining the [colours in the
spreadsheet] inside the spreadsheet], ...




On Thu, Jan 19, 2017 at 4:17 PM, Francis Bond <bond at ieee.org> wrote:

> Thank you very much!
>
> On Thu, Jan 19, 2017 at 3:50 PM, Stephan Oepen <oe at ifi.uio.no> wrote:
>
>> the 1214 release of the ERG includes an updated version of
>> ‘etc/redwoods.xls’, naturally. although the specific item counts will
>> differ a bit, the training–development–test designations remain unchanged,
>> for all i recall.
>>
>> best wishes, oe
>>
>>
>> On Fri 20 Jan 2017 at 00:36 Francis Bond <bond at ieee.org> wrote:
>>
>>> G'day,
>>>
>>> Is there a later version of this useful file, or is this still current?
>>>
>>>
>>>
>>>
>>> we propose development (orange) and test (yellow) sections for the
>>>
>>>
>>> various sub-corpora in recent Redwoods releases; please see:
>>>
>>>
>>>
>>>
>>>
>>>   http://svn.delph-in.net/erg/tags/1212/etc/redwoods.xls
>>>
>>>
>>>
>>>
>>>
>>> so, for this release of the WSJ text (DeepBank 1.0), we suggest to
>>>
>>>
>>> test against Section 21.
>>>
>>>
>>>
>>>
>>>
>>> best wishes, oe
>>>
>>>
>>>
>>> --
>>> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>>> Division of Linguistics and Multilingual Studies
>>> Nanyang Technological University
>>>
>>>
>>>
>
>
> --
> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
> Division of Linguistics and Multilingual Studies
> Nanyang Technological University
>



-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20170428/dd91d318/attachment.html>


More information about the developers mailing list