<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Ann Copestake wrote: <blockquote cite="midE1DkQ20-0003Wo-00@mta1.cl.cam.ac.uk" type="cite"> <pre wrap="">just to clarify - is this anything to do with the *grammar-locale* idea or are you not proposing to pursue that? </pre> </blockquote> The mechanism I implemented in the NorSource grammar was going to be a tentative form of (B): > - (B) Provide users with full comprehensible instructions on navigating the encoding maze, perhaps giving feedback at grammar load time if any settings are likely to be problematic. The idea was to check Lisp's *locale* against a parameter *grammar-locale* set in globals.lsp, complaining and outputting an informative message if they did not match appropriately. But it seems to me we can do better than this. By setting *locale* at grammar load time there would be no need to tell each and every user of the LKB hoping to run a particular grammar that they must ensure the LKB's locale is set appropriately for the specific grammar at LKB startup. A *locale* setting still needs to be placed in globals.lsp by the grammar writer when the grammar is first created, but from then on that grammar writer, and any other grammar users/writers, need not fiddle with locale settings in order to get the grammar to run on other machines. <blockquote cite="midE1DkQ20-0003Wo-00@mta1.cl.cam.ac.uk" type="cite"> <pre wrap="">Also you say that you're aiming for a solution along the following lines: </pre> <blockquote type="cite"> <pre wrap="">- (C) Ensure encodings are automatically set correctly at grammar load time. </pre> </blockquote> <pre wrap="">but as far as I can see, you're proposing we tell users a (partial) recipe. Partial because they have to know their own appropriate values for *locale*. </pre> </blockquote> The appropriate locale needs to be set when the grammar is first created (see above). Thereafter the locale will be set automatically. Hence the grammar creator must know the appropriate value for *locale*. But the end users need not be bothered with such details. <blockquote cite="midE1DkQ20-0003Wo-00@mta1.cl.cam.ac.uk" type="cite"> <pre wrap="">This seems sensible, but wasn't what I'd understood from your point C. Will what you're proposing give the same results as what's currently being done for Greek, for example? see the wiki </pre> </blockquote> The results should be identical. Taking the Greek grammar settings as an example: (X) Instead of the *user* placing the following in their .emacs <pre>(unless (boundp 'fi:common-lisp-image-arguments) (setq fi:common-lisp-image-arguments nil)) (setq fi:common-lisp-image-arguments (nconc (list "-locale" "el_GR.utf8") fi:common-lisp-image-arguments))</pre> one can place the following line in the *grammar*'s global.lsp () #+:allegro (defparameter excl:*locale* (excl::find-locale "el_GR.utf8")) This ensures that the grammar files are loaded using the correct encoding. (Y) All LKB users (independent of grammar) could be told to ensure they have the following line in their .emacs <pre>(set-default-coding-systems 'mule-utf-8)</pre> This ensures that Lisp and Emacs can can communicate any character in either direction. To ensure that users can edit the grammar files from within Emacs, either -(Z.1) all users of the greek grammar must include the following in their .emacs <pre>(set-language-environment 'Greek) </pre> - or (Z.2) the grammar files must tell Emacs in what encoding they are to be edited, by using the following header: ;;; -*- Mode: TDL; Coding: utf8 -*- One could in fact replace (Y+Z.1) with '(set-language-environment 'Unicode)' + (Z.2) to avoid the grammar dependent step (Y). But (Z.2) could be seen as too fiddly. We must also set: <pre>;; Write CDB temporary files as binary (defparameter cdb::*cdb-ascii-p* nil)</pre> But I think we could make this setting a default in the LKB (how many people use pure ASCII grammars, as well as not using the LexDB?). Japanese has a further complication in that ChaSen expects an EUC encoding, and hence at present the following must be evaluated from the Lisp buffer: <pre>(setf excl:*default-external-format* (setf (stream-external-format *terminal-io*) :euc))</pre> A cleaner way could be to bind  *terminal-io* appropriately during calls to ChaSen. <blockquote cite="midE1DkQ20-0003Wo-00@mta1.cl.cam.ac.uk" type="cite"> <pre wrap="">Incidentally, non-ASCII input to the CLIM window is possible under Windows and may well simply be a bug in the Linux version. So we do need to think about it. </pre> </blockquote> Has anyone taken a detailed look at the cause of the bug(?) in the Lisp CLIM? Thanks, -Ben <blockquote cite="midE1DkQ20-0003Wo-00@mta1.cl.cam.ac.uk" type="cite"> <pre wrap="">Ann PS - whatever encoding you're using for your email causes it to display in a tiny font on exmh </pre> </blockquote> I used the utf8 unicode encoding. </body> </html>