[developers] DeepBank train-test split?

Francis Bond bond at ieee.org
Sat Apr 29 08:39:26 CEST 2017


Thanks a lot!

On Fri, Apr 28, 2017 at 11:35 PM, Stephan Oepen <oe at ifi.uio.no> wrote:

> hiya,
>
> both yellow and green designate suggested test segments, where green
> indicates the ‘higher’ standard of being unseen by the grammar writer at
> the time of treebanking.
>
> the red entries are missing from the 1214 release (still, probably
> eternally for this version).
>
> thanks for the suggestions, i will add a legend and some way of spelling
> out the grouping across profiles to the 1214 spreadsheet.
>
> oe
>
>
> On Sat 29 Apr 2017 at 01:41 Francis Bond <bond at ieee.org> wrote:
>
>> G'day,
>>
>> revisiting this, as more and more people are ready to use our data:
>> >> we propose development (orange) and test (yellow) sections for the
>>
>> In the excel file, we also have red and green (for the file linked from
>> the wiki)
>> http://svn.delph-in.net/erg/tags/1212/etc/redwoods.xls
>>
>> E.g. for 1212 wsj21 is green (in column A) and red (in column B)
>>
>> Does anyone know what these mean?
>>
>> Also, can we suggest another column with corpus name?   Mike and I would
>> be happy to add what we know (most of them), but at the moment it is a
>> little opaque for most people.  In particular the summary uses different
>> names to the profiles (how should someone know that cf10 is e.g. Brown?)
>>
>> Finally, it would be nice to have a legend [explaining the [colours in
>> the spreadsheet] inside the spreadsheet], ...
>>
>>
>>
>>
>> On Thu, Jan 19, 2017 at 4:17 PM, Francis Bond <bond at ieee.org> wrote:
>>
>>> Thank you very much!
>>>
>>> On Thu, Jan 19, 2017 at 3:50 PM, Stephan Oepen <oe at ifi.uio.no> wrote:
>>>
>>>> the 1214 release of the ERG includes an updated version of
>>>> ‘etc/redwoods.xls’, naturally. although the specific item counts will
>>>> differ a bit, the training–development–test designations remain unchanged,
>>>> for all i recall.
>>>>
>>>> best wishes, oe
>>>>
>>>>
>>>> On Fri 20 Jan 2017 at 00:36 Francis Bond <bond at ieee.org> wrote:
>>>>
>>>>> G'day,
>>>>>
>>>>> Is there a later version of this useful file, or is this still current?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> we propose development (orange) and test (yellow) sections for the
>>>>>
>>>>>
>>>>> various sub-corpora in recent Redwoods releases; please see:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>   http://svn.delph-in.net/erg/tags/1212/etc/redwoods.xls
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> so, for this release of the WSJ text (DeepBank 1.0), we suggest to
>>>>>
>>>>>
>>>>> test against Section 21.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> best wishes, oe
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>>>>> Division of Linguistics and Multilingual Studies
>>>>> Nanyang Technological University
>>>>>
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>>> Division of Linguistics and Multilingual Studies
>>> Nanyang Technological University
>>>
>>
>>
>>
>> --
>> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>> Division of Linguistics and Multilingual Studies
>> Nanyang Technological University
>>
>


-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20170428/942e2e15/attachment-0001.html>


More information about the developers mailing list