[itsdb] parse big corpus with itsdb
David Martinez
davidm at csse.unimelb.edu.au
Mon Oct 23 04:05:53 CEST 2006
Dear list members,
we have recently started to use the itsdb interface to process
corpora with different grammars in different languages. We didn't have
any problem to process small files, but now we want to parse a corpus of
5M sentences (10k examples per file), and we didn't find a way to select
all the target files, process all items, and extract trees in batch mode
using the tsdb interface.
We have been looking at ways to interact with the command-line
interface with tsdb-do-process, but my lisp is almost non-existant, and I
didn't know which parameters to use in the function calls.
Could you give me pointers on how to do this? I would like to
create a function that parses and exports trees for each of the files in
turn. Any help will be appreciated.
Best,
David
More information about the itsdb
mailing list