[itsdb] [lkb] the fine system and unicode

Ben Waldron benjamin.waldron at cl.cam.ac.uk
Mon Feb 13 12:48:33 CET 2006


Stephan Oepen wrote:
>getting UniCode to work in [incr tsdb()] is not much of a problem.  you
>should make sure that
>
>  (a) your [incr tsdb()] data files (skeletons or ASCII import files)
>      are all coded in UTF-8.
>  
You can use the 'file' command under Linux to check the encoding of files:

    bmw20 at bmw-1:~/erg> file irregs.tab
    irregs.tab: UTF-8 Unicode text

>  (b) the Lisp universe running [incr tsdb()] uses a UTF-8 locale; try
>      evaluating excl:*locale* to check, and then maybe use the -locale
>      command line option to the underlying Lisp image (ACL appears to
>      not choose its initial locale based on the LANG shell variable).
>  
An alternative to explicitly setting -locale when starting the Lisp 
image is to set the coding system as a property of the grammar files. 
E.g. you can place the following in GRAMMAR/lkb/globals:

    (when (lkb-version-after-p "2006/02/08 15:00:00")
      (set-coding-system utf-8))

OR if your LKB image is old (and you are running Allegro CL):

    (setf excl:*locale* (excl::find-locale ".utf8"))

- Ben



More information about the itsdb mailing list