[developers] Re: emacs encoding issues (JACY)
Ben Waldron
benjamin.waldron at cl.cam.ac.uk
Sat Jun 25 22:35:42 CEST 2005
Ben Waldron wrote:
> Francis Bond wrote:
>
>> G'day,
>>
>> Yes. Shall I check in a patch to the JACY CVS?
>>
>>
>> Sure. Would that be patching user-fns.lsp?
>
>
> Yes, here:
>
> #+:chasen
> (defun preprocess-sentence-string (string &key (verbose
> *chasen-debug-p*) posp)
> ...
I've had another look at this:
==
We have found that for the latest eli and emacs 21.4, that it always
sets the (stream-external-format *terminal-io*) to :emacs-mule. We
prefer it to be EUC-JP, so we evaluate the following in the lisp buffer:
(setf excl:*default-external-format*
(setf (stream-external-format *terminal-io*) :euc))
==
The above, taken from the Wiki, was necessary because the japanify
function was resetting (to EUC) the encoding used by Emacs in
translating communications to/from the Lisp process, whilst the Lisp
process was using the encoding which Emacs had told it to use at Lisp
startup ((stream-external-format *terminal-io*) set to :emacs-mule). The
above code ensured that the encodings matched again.
The solution would seem to be not to alter the encoding in the first place.
==
;;; this sets up an encoding
(defun japanify (buffer encoding)
(save-excursion
(switch-to-buffer buffer)
(set-language-environment 'japanese)
(set-buffer-file-coding-system encoding)
(set-buffer-process-coding-system encoding encoding)) ;; <= X
(setq default-buffer-file-coding-system encoding))
(defun lisp (&optional prefix)
(setq lkb-tmp-dir "/tmp")
(interactive "P")
(load "/usr/local/delphin/acl/eli/fi-site-init")
(setq fi:common-lisp-image-name "/usr/local/delphin/acl/alisp")
(setq fi:common-lisp-image-file "/usr/local/delphin/acl/bclim.dxl")
(setq fi:common-lisp-image-arguments
(list
"-locale" "japan.EUC"
"-qq" "-L" "/usr/local/delphin/cl-init.cl"))
(fi:common-lisp)
(japanify "*common-lisp*" 'euc-jp)) ;; <= X
==
Regarding ChaSen: if we ensure that Emacs and Lisp agree on an encoding
we can run the 'chasen' command, and if we further ensure that Lisp's
*locale* is set to EUC (as it is for the JACY grammar) then we will
exchange data with the ChaSen process in the encoding it can handle.
According to the (limited) documentation that I was able to find on the
net, it should be possible to tell ChaSen to use alternative encodings
to EUC:
==
2. How to use ChaSen system
---------------------------
Suppose a Japanese text file `nihongo', which should be encoded in
Japanese EUC (Extended UNIX Code), JIS (ISO-2022-JP), Shift_JIS
(MS Kanji) or UTF-8. Issue the following command:
% chasen nihongo # Use the system default encode
% chasen -i e nihongo-euc # Use EUC-JP or JIS
% chasen -i s nihongo-euc # Use Shift_JIS
% chasen -i w nihongo-euc # Use UTF-8
The result of the morphological analysis is shown on the standard
output. If your terminal has a direct input facility of Japanese
characters, simply type
% chasen
then input a Japanese sentence followed by a carrige return.
==
This doesn't work when I try it though... So I'll just but a wrapper in
preprocess-sentence-string to ensure that *locale* is 'japan.EUC' when
we talk to ChaSen.
To summarize, I think the code suggested on the Wiki can be reduced to
the following:
==
(defun lisp (&optional prefix)
(interactive "P")
(set-language-environment 'japanese) ;; set input method/default
coding for files
(setq default-buffer-file-coding-system 'euc-jp) ;; ensure new files
saved in correct encoding
(load "/usr/local/acl/acl70/eli/fi-site-init")
(setq fi:common-lisp-image-name "/usr/local/acl/acl70/alisp")
(setq fi:common-lisp-image-file "/usr/local/acl/acl70/bclim.dxl")
(setq fi:common-lisp-image-arguments (list "-locale" "japan.EUC"))
;; (<= X) ensure Lisp loads grammar files in correct encoding
(fi:common-lisp))
==
The marked line can go if we include the following in globals.lsp (so
that Lisp sets its locale appropriately):
==
#+:allegro
(defparameter excl:*locale* (excl::find-locale "japan.EUC"))
==
Some other questions I had concerning ChaSen:
- is there an up-to-date manual in English (I can't read Japanese
without the help of the Google translator...)?
- has anyone run/considered running ChaSen in server mode?
- can ChaSen return a lattice?
Thanks,
-Ben
More information about the developers
mailing list