[itsdb] parse big corpus with itsdb
David Martinez
davidm at csse.unimelb.edu.au
Wed Nov 8 00:49:38 CET 2006
Hello all,
thank you very much Francis, your answer was really helpful for
our task.
What I did to parse the corpus was to look at the format of the
profiles and create them directly from my corpus using a perl script.
Then I listed all the commands that I needed: set the parameters,
load lkb, load tsdb, load erg (for export), load the pet CPUs, process the
profiles, and export the trees. I make a single system call with the list
of commands for every 10,000 sentences and it works fine.
However, I have another question, maybe someone can help me. I
would like to use only one analysis per sentence, and I limited the number
of answers to 1. But I don't know if this is the way to get the best
possible analysis, is there some other switch that I could use?
Thanks in advance,
David
On Mon, 23 Oct 2006, David Martinez wrote:
>
> Dear list members,
>
> we have recently started to use the itsdb interface to process
> corpora with different grammars in different languages. We didn't have any
> problem to process small files, but now we want to parse a corpus of 5M
> sentences (10k examples per file), and we didn't find a way to select all the
> target files, process all items, and extract trees in batch mode using the
> tsdb interface.
> We have been looking at ways to interact with the command-line
> interface with tsdb-do-process, but my lisp is almost non-existant, and I
> didn't know which parameters to use in the function calls.
> Could you give me pointers on how to do this? I would like to create
> a function that parses and exports trees for each of the files in turn. Any
> help will be appreciated.
>
> Best,
> David
>
More information about the itsdb
mailing list