[itsdb] parse big corpus with itsdb

David Martinez davidm at csse.unimelb.edu.au
Mon Oct 23 04:05:53 CEST 2006


 	Dear list members,

 	we have recently started to use the itsdb interface to process 
corpora with different grammars in different languages. We didn't have 
any problem to process small files, but now we want to parse a corpus of 
5M sentences (10k examples per file), and we didn't find a way to select 
all the target files, process all items, and extract trees in batch mode 
using the tsdb interface.
 	We have been looking at ways to interact with the command-line 
interface with tsdb-do-process, but my lisp is almost non-existant, and I 
didn't know which parameters to use in the function calls.
 	Could you give me pointers on how to do this? I would like to 
create a function that parses and exports trees for each of the files in 
turn. Any help will be appreciated.

 	Best,
 	David



More information about the itsdb mailing list