[developers] Re: grammar locale issues

Stephan Oepen oe at csli.Stanford.EDU
Tue Jun 21 21:03:30 CEST 2005


hi again, ben,

> Do you have any comments on the proposals below?

yes, plenty :-).  but not really much time to follow-up on these right
now.

i am sympathetic to your proposal to build more checks and balances in,
specifically make the `default' encoding for each grammar explicit as a
global variable (we have long had that property in PET).  i thought the
*grmmar-encoding* global, plus read-script-file-aux() doing some sanity
checking was a promising idea.

also, i had been planning to toggle

  (defparameter cdb::*cdb-ascii-p* nil)

since the current LKB `dot.emacs' defaults to UTF-8 now, and i agree to
your expectations that ASCII grammar will not break by writing two-byte
CDB entries (i was hoping to test that assumption, though :-).

finally, i am nervous about the set-up you propose where we try to make
ELI communication always be UTF-8, but potentially have another coding
convention for the grammar files (or i/o with sub-processes).  this is
a new idea to me (i did not think it would be possible), but in general
my experience has been that ensuring _one_ consistent coding system at
all levels is the path to happiness: i believe your proposal could mean
that the *common-lisp* buffer has a different (process) coding system
than buffers visiting TDL files (e.g. for JaCY, where files continue to
be in EUC, for now).  i fear that will be harder to set up reliably for
emacs(1) than just one consistent scheme and create potential for user
confusion (and i have seen difficulties pasting in X across encodings).

i am not convinced this level of sophistication is really needed.  some
of the currently documented procedures are more complex than i think is
required (today).  for example, the following just works for me (modulo
substitution of $DELPHINHOME, of course):

  emacs -q &
  M-x load-file RET $DELPHINHOME/lkb/etc/dot.emacs RET
  M-x japanese RET
  (read-script-file-aux "$DELPHINHOME/japanese/lkb/ascript")
  (do-parse-tty "食べた")

--- melanie will be visiting here in july, and francis and i expect to
streamline set-up for JaCY during her visit.

somewhat more high-level, i am inclined to encourage more people to use
UTF-8, but in western europe and japan, at least, there appears to be a
strong, established non-UniCode tradition :-{.

                                       so much for tonight; hth  -  oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (ILN); Boks 1102 Blindern; 0317 Oslo; (+47) 2285 7989
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at csli.stanford.edu; oe at hf.uio.no; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



More information about the developers mailing list