[developers] Fwd: a couple of DeepBank questions

Megan Schneider caelum at gmail.com
Fri Feb 1 18:21:28 CET 2013


Thank you!

Megan

On Fri, Feb 1, 2013 at 1:10 AM, Yi Zhang <yzhang at coli.uni-sb.de> wrote:

>
> Begin forwarded message:
>
> *From: *Yi Zhang <yizhang at dfki.de>
> *Subject: **Re: [developers] a couple of DeepBank questions*
> *Date: *February 1, 2013 10:09:04 AM GMT+01:00
> *To: *Megan Schneider <caelum at gmail.com>
> *Cc: *developers at delph-in.net
>
> hi Megan,
>
> 1) How do the DeepBank sentence identifiers map to the Penn Treebank? (
> 20020005 appears to map to the 5th sentence in RAW/parsed/mrg/wsj/00/wsj_0020.mrg
> and 20001001 appears to map to the first sentence in 00/wsj_0001.mrg from
> looking at where the sentences in question exist)
>
> yes, your observation is right. the sentence identifiers in DeepBank are
> 8-digit integers, always starting (from left) with "2", followed by 4
> digits corresponding to the file name in the PTB (e.g. 0234 is from the
> file 02/wsj_0234.mrg), and ends with 3 digits corresponding to the sentence
> number in that file (starting from 1).
>
>
> 2) Does anyone have a version of the Penn Treebank which is limited to
> only those parses/sentences also contained in DeepBank?
>
>
> you can find a simple perl script from the following link, which will
> select and print the subpart of PTB (in original .mrg format) according to
> a list of sentence ids.
> http://www.coli.uni-saarland.de/~yzhang/files/select-ptb-with-iid.pl
>
> a simple way of getting the id list is the following command line
> (assuming you are doing it on the DeepBank release 0.9, which contains
> thinned tsdb profiles):
>
>  $ for i in deepbank-0.9/tsdb/*.1; do zcat $i/result.gz | cut -f 1 -d@ >>
> id-list.txt; done
>
> afterwards, run the perl script:
>  $ perl select-ptb-with-iid.pl  id-list.txt penntreebank3/parsed/mrg/wsj/
> > ptb-deepbank-0.9.mrg
>
> best,
> yi
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20130201/1dfa8d9e/attachment.html>


More information about the developers mailing list