[developers] Re: grammar locale issues

Ben Waldron benjamin.waldron at cl.cam.ac.uk
Tue Jun 21 15:59:41 CEST 2005

Ann Copestake wrote:

>just to clarify - is this anything to do with the *grammar-locale* idea or are 
>you not proposing to pursue that?
The mechanism I implemented in the NorSource grammar was going to be a 
tentative form of (B):

 > - (B) Provide users with full comprehensible instructions on 
navigating the encoding maze, perhaps giving feedback at grammar load 
time if any settings are likely to be problematic.

The idea was to check Lisp's *locale* against a parameter 
*grammar-locale* set in globals.lsp, complaining and outputting an 
informative message if they did not match appropriately. But it seems to 
me we can do better than this. By setting *locale* at grammar load time 
there would be no need to tell each and every user of the LKB hoping to 
run a particular grammar that they must ensure the LKB's locale is set 
appropriately for the specific grammar at LKB startup. A *locale* 
setting still needs to be placed in globals.lsp by the grammar writer 
when the grammar is first created, but from then on that grammar writer, 
and any other grammar users/writers, need not fiddle with locale 
settings in order to get the grammar to run on other machines.

>Also you say that you're aiming for a solution along the following lines:
>>- (C) Ensure encodings are automatically set correctly at grammar load time.
>but as far as I can see, you're proposing we tell users a (partial) recipe.  
>Partial because they have to know their own appropriate values for *locale*.
The appropriate locale needs to be set when the grammar is first created 
(see above). Thereafter the locale will be set automatically. Hence the 
grammar creator must know the appropriate value for *locale*. But the 
end users need not be bothered with such details.

>This seems sensible, but wasn't what I'd understood from your point C.
>Will what you're proposing give the same results as what's currently being 
>done for Greek, for example?  see the wiki
The results should be identical. Taking the Greek grammar settings as an 

(X) Instead of the *user* placing the following in their .emacs

(unless (boundp 'fi:common-lisp-image-arguments)
  (setq fi:common-lisp-image-arguments nil))
(setq fi:common-lisp-image-arguments
      (nconc (list "-locale" "el_GR.utf8") fi:common-lisp-image-arguments))

one can place the following line in the *grammar*'s global.lsp ()

(defparameter excl:*locale* (excl::find-locale "el_GR.utf8"))

This ensures that the grammar files are loaded using the correct encoding.

(Y) All LKB users (independent of grammar) could be told to ensure they 
have the following line in their .emacs

(set-default-coding-systems 'mule-utf-8)

This ensures that Lisp and Emacs can can communicate any character in 
either direction.

To ensure that users can edit the grammar files from within Emacs, either
-(Z.1) all users of the greek grammar must include the following in 
their .emacs

(set-language-environment 'Greek)

- or (Z.2) the grammar files must tell Emacs in what encoding they are 
to be edited, by using the following header:

;;; -*- Mode: TDL; Coding: utf8 -*-

One could in fact replace (Y+Z.1) with '(set-language-environment 
'Unicode)' + (Z.2) to avoid the grammar dependent step (Y). But (Z.2) 
could be seen as too fiddly.

We must also set:

;; Write CDB temporary files as binary
(defparameter cdb::*cdb-ascii-p* nil)

But I think we could make this setting a default in the LKB (how many 
people use pure ASCII grammars, as well as not using the LexDB?).

Japanese has a further complication in that ChaSen expects an EUC 
encoding, and hence at present the following must be evaluated from the 
Lisp buffer:

(setf excl:*default-external-format*
 (setf (stream-external-format *terminal-io*) :euc))

A cleaner way could be to bind  *terminal-io* appropriately during calls 
to ChaSen.

>Incidentally, non-ASCII input to the CLIM window is possible under Windows and 
>may well simply be a bug in the Linux version.  So we do need to think about 
Has anyone taken a detailed look at the cause of the bug(?) in the Lisp 


>PS - whatever encoding you're using for your email causes it to display in a 
>tiny font on exmh
I used the utf8 unicode encoding.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20050621/96f24531/attachment.html>

More information about the developers mailing list