[developers] Re: small ERG lexicon

Ben Waldron benjamin.waldron at cl.cam.ac.uk
Fri Apr 22 17:34:57 CEST 2005


Ann Copestake wrote:

>One more thing - it would be very useful for me to have a smaller lexicon
>available for the ERG.  I suspect this is true for other people too if they
>can't use the db or don't want to.  I think the ideal situation would be if a
>lexicon could be dumped that contains all the words that might be accessed when
>processing the CSLI test suite (i.e., include all senses and all MWES which
>have a match on the RHS).  If this would be easy to do automatically, perhaps
>you could add something to the routine which dumps the main lexicon in TDL
>format?  See what Dan thinks anyway.  It would need to be done in such a way
>that it wasn't any extra work for him.  I think you can get the list of words
>used from the fine system.
>  
>
I've created a function (dump-small-lexicon) to perform the above task. 
The lex ids are taken from *lex-ids-used*. The file name can be 
specified using :file, but also defaults to something sensible.

erg/lexicon-small.tdl (derived from the CSLI test suite) is now in the CVS.

Stephan, would it be possible to hook functions like the above on to 
tsdb::finalize-run? That way it could be called automatically only when 
meaningful to do so.

-Ben





More information about the developers mailing list