[developers] grammar locale issues
benjamin.waldron at cl.cam.ac.uk
Mon Jun 20 17:04:44 CEST 2005
Ann Copestake wrote:
>yes, but *grammar-locale* is not used anywhere, is it? Has there been an
>agreement that this is going to be used? If not, and if you think this is the
>right way to handle it, please propose it to the developers list and get
>agreement. I know that you did send email to me, oe and Melanie about some
>variant of this idea some time ago, but developers is now the right venue, and
>now would be a good time to start discussion.
>The problem is that, as things stand, if NorSource is using this and nobody
>else is, then the NorSource guys are going to have problems that are
>incomprehensible to other users. I think that as a matter of principle one
>should not add new functions to user-fns for individual grammars - it's not
>good in general for people to be running variants of the code since it's much
>more difficult to help them. If this is good for NorSource, it would be good
>for other grammars too - hence should be proposed and turned into a general
Here would be my proposal:
Individual grammars encode their files using various encoding systems
(eg. Latin-1, EUC, Unicode,...). This encoding is a property of the
grammar files as a whole. Internally Lisp (at least, Allegro Common
Lisp) stores all strings as 16-bit Unicode. But in order for the grammar
to run correctly inside the LKB it is necessary that characters entering
the Lisp universe are decoded correctly. Hence:
- (1) Lisp must use the correct encoding when reading in the grammar files.
- (2) Characters received from (or sent to) Emacs must be decoded correctly.
- (3) Characters received from (or sent to) the GUI (CLIM) must be
(3) is no problem, because (as I understand it) a CLIM input window is
able only to accept plain ASCII characters anyway. Hence (2) becomes
vital for any grammar expecting to process any non-ASCII characters. If
(1) is not satisfied, the grammar will fail to load, spitting out errors
such as shown below but not telling the user why these strange errors
might be occuring
Syntax error at position 62971:
Incorrect syntax following type name #\[
Ignoring (part of) entry for #\[
Incorrect syntax following type name POS_PL_A_5_0_4_0_3_0_2_0_1_0_SM堺
Ignoring (part of) entry for POS_PL_A_5_0_4_0_3_0_2_0_1_0_SM堺
Alternatively, the grammar files could appear to load fine, but inside
the LKB the entries will be manged and will not function properly.
- (A) Leave things as they are...
- (B) Provide users with full comprehensible instructions on navigating
the encoding maze, perhaps giving feedback at grammar load time if any
settings are likely to be problematic.
- (C) Ensure encodings are automatically set correctly at grammar load time.
I'd like to propose a solution along the lines of (C).
Note that issue (1) can be resolved (with Allegro Common Lisp, at least)
by setting *locale* at grammar load time, eg. in globals.lsp:
(defparameter excl:*locale* (excl::find-locale "no.latin1"))
Issue (2) requires that Lisp and Emacs talk to each other using the same
encoding, and also that this encoding can handle all characters passed
in either direction. Unicode satisfies this requirement. Hence we need
simply run Emacs from within a Unicode environment (the encoding is
inherited), or include the following in the .emacs configuration file
The standard streams (*terminal-io*, *standard-input*,
*standard-output*, *error-output*, *trace-output*, *query-io*, and
*debug-io*) are set at Lisp startup (and are unaffected by any -locale
argument passed to the Lisp process). So long as Emacs is run as above,
emacs-mule will be used for both the interprocess communication and for
the encoding of these streams (and we are assured that they will happily
process any character we throw at them).
Issue (3) will not give rise to encoding errors since CLIM handles only
ASCII. But moving to a GUI able to handle more than plain ASCII would be
helpful for many users. Note that CLIM will happily display non-ASCII
characters, as long as they were decoded correctly at the time they
entered the LIsp universe.
Feedback much appreciated,
More information about the developers