[developers] [itsdb] updates to LKB code base

Ben Waldron benjamin.waldron at cl.cam.ac.uk
Tue Feb 14 15:23:57 CET 2006


Stephan Oepen wrote:
>  - wrapping a function originally provided by ben into a macro, so as
>    to allow grammars to request the appropriate coding system, e.g.
>
>    (when (lkb-version-after-p "2006/02/08 15:00:00")
>      (set-coding-system utf-8))
>
>    i would recommend putting a form like this at the top of `script'
>    in each grammar, such that this central property is made explicit.
>  
The above change (which seems to have been more than wrapping the 
original function in a macro) broke the existing functionality (which 
some people were using). Specifically, the old code allowed all external 
formats recognised by Allegro to be used, whilst the new code accepted 
only four hard-coded coding system names (three in fact, as :euc-jp is 
not a valid external format in Allegro). I've fixed the code (see below 
-- the changes are compatible with both the older and newer versions on 
the code), but please could you consult before changing such code in future?

Allowable coding system names are specified in *coding-system-names*:

;; mapping from recognised names to canonical names (first element in 
each list)
;; TO DO: add Emacs coding names
(defconstant *coding-system-names*
    '(
      (:iso8859-1 :latin1 :ascii :8-bit :1250 :iso88591)
      (:1251) ;; For MS Windows
      (:1252) ;; For MS Windows
      (:1253) ;; For MS Windows
      (:1254) ;; For MS Windows
      (:1255) ;; For MS Windows
      (:1256) ;; For MS Windows
      (:1257) ;; For MS Windows
      (:1258) ;; For MS Windows
      (:iso8859-2 :latin-2 :latin2)
      (:iso8859-3 :latin-3 :latin3)
      (:iso8859-4 :latin-4 :latin4)
      (:iso8859-5 :latin-5 :latin5)
      (:iso8859-6 :latin-6 :latin6)
      (:iso8859-7 :latin-7 :latin7)
      (:iso8859-8 :latin-8 :latin8)
      (:iso8859-9 :latin-9 :latin9)
      (:iso8859-14 :latin-14 :latin14)
      (:iso8859-15 :latin-15 :latin15)
      (:koi8-r)
      (:emacs-mule)
      (:utf8 :utf-8)
      (:big5)
      (:gb2312)
      (:euc :ujis :euc-jp :eucjp)
      (:874) ;; For MS Windows
      (:932) ;; For MS Windows
      (:936) ;; For MS Windows
      (:949) ;; For MS Windows
      (:950) ;; For MS Windows
      (:jis)
      (:shiftjis)))

Each Lisp implementation can specify a mapping from canonical to 
internal coding system identifiers:

;; mapping from canonical names to internal names
;; specified as:
;;  - :XXX when canonical name same as internal name
;;  - (:CANONICAL . :INTERNAL) otherwise
(defconstant *canonical-to-internal-coding-name-mapping*
    #+:allegro
    '(
      :iso8859-1
      :1251
      :1252
      :1253
      :1254
      :1255
      :1256
      :1257
      :1258
      :iso8859-2
      :iso8859-3
      :iso8859-4
      :iso8859-5
      :iso8859-6
      :iso8859-7
      :iso8859-8
      :iso8859-9
      :iso8859-14
      :iso8859-15
      :koi8-r
      :emacs-mule
      :utf8
      :big5
      :gb2312
      :euc
      :874
      :932
      :936
      :949
      :950
      :jis
      :shiftjis)
    #-:allegro
    NIL
    )

Regards,
- Ben



More information about the developers mailing list