[itsdb] parse big corpus with itsdb

Thu Nov 9 03:35:01 CET 2006

G'day,

>         thanks to all for the information on the scoring model. My problem now
> is that I can't find the model "vm6p.mem" in erg. I looked in the versions
> Jan-2006 and Jun-2006, and I found "vm.mem", but not "vm6p".
>
>         When I tried with "vm.mem" as scoring method I got many errors that
> said "Unknown type/instance". If I keep "rondane.mem" it doesn't give me
> errors, but I don't know if it does the job.
>
>         Tim dug around and found that vm6p.mem had been committed to the ERG
> "Attic". Does this mean it is now deprecated? Also, the total component of
> trained models in CVS appears to be:
>
> -rw-r--r--  1 tim tim 588K Jun  6  2004 ec.mem
> -rw-r--r--  1 tim tim 904K Jul 11 08:05 jh.mem
> -rw-r--r--  1 tim tim 5.1M Jul 11 08:41 jhpstg.g.mem
> -rw-r--r--  1 tim tim 1.2M Dec 10  2005 logon.g.mem
> -rw-r--r--  1 tim tim 814K Aug  3  2005 redwoods.mem
> -rw-r--r--  1 tim tim 600K Jul 11 07:22 rondane.mem
> -rw-r--r--  1 tim tim 592K Jun  6  2004 vm.mem

I would guess that rondane is trained on the rondane corpus (hiking),
vm on verbmobil (hotel reservations) and redwoods on everything, but
may be a grammar version behind.

Whether you prefer rondane or vm depends on what you are trying to parse.

-- 
Francis Bond  <www.kecl.ntt.co.jp/icl/mtg/members/bond/>
NTT Communication Science Laboratories | Natural Language Research Group