[developers] DeepBank train-test split?

Stephan Oepen oe at ifi.uio.no
Sat Apr 29 08:35:38 CEST 2017


hiya,

both yellow and green designate suggested test segments, where green
indicates the ‘higher’ standard of being unseen by the grammar writer at
the time of treebanking.

the red entries are missing from the 1214 release (still, probably
eternally for this version).

thanks for the suggestions, i will add a legend and some way of spelling
out the grouping across profiles to the 1214 spreadsheet.

oe


On Sat 29 Apr 2017 at 01:41 Francis Bond <bond at ieee.org> wrote:

> G'day,
>
> revisiting this, as more and more people are ready to use our data:
> >> we propose development (orange) and test (yellow) sections for the
>
> In the excel file, we also have red and green (for the file linked from
> the wiki)
> http://svn.delph-in.net/erg/tags/1212/etc/redwoods.xls
>
> E.g. for 1212 wsj21 is green (in column A) and red (in column B)
>
> Does anyone know what these mean?
>
> Also, can we suggest another column with corpus name?   Mike and I would
> be happy to add what we know (most of them), but at the moment it is a
> little opaque for most people.  In particular the summary uses different
> names to the profiles (how should someone know that cf10 is e.g. Brown?)
>
> Finally, it would be nice to have a legend [explaining the [colours in the
> spreadsheet] inside the spreadsheet], ...
>
>
>
>
> On Thu, Jan 19, 2017 at 4:17 PM, Francis Bond <bond at ieee.org> wrote:
>
>> Thank you very much!
>>
>> On Thu, Jan 19, 2017 at 3:50 PM, Stephan Oepen <oe at ifi.uio.no> wrote:
>>
>>> the 1214 release of the ERG includes an updated version of
>>> ‘etc/redwoods.xls’, naturally. although the specific item counts will
>>> differ a bit, the training–development–test designations remain unchanged,
>>> for all i recall.
>>>
>>> best wishes, oe
>>>
>>>
>>> On Fri 20 Jan 2017 at 00:36 Francis Bond <bond at ieee.org> wrote:
>>>
>>>> G'day,
>>>>
>>>> Is there a later version of this useful file, or is this still current?
>>>>
>>>>
>>>>
>>>>
>>>> we propose development (orange) and test (yellow) sections for the
>>>>
>>>>
>>>> various sub-corpora in recent Redwoods releases; please see:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>   http://svn.delph-in.net/erg/tags/1212/etc/redwoods.xls
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> so, for this release of the WSJ text (DeepBank 1.0), we suggest to
>>>>
>>>>
>>>> test against Section 21.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> best wishes, oe
>>>>
>>>>
>>>>
>>>> --
>>>> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>>>> Division of Linguistics and Multilingual Studies
>>>> Nanyang Technological University
>>>>
>>>>
>>>>
>>
>>
>> --
>> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>> Division of Linguistics and Multilingual Studies
>> Nanyang Technological University
>>
>
>
>
> --
> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
> Division of Linguistics and Multilingual Studies
> Nanyang Technological University
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20170429/39a74fa3/attachment.html>


More information about the developers mailing list