[developers] pet-input-chart punctuation-characters

Francis Bond fcbond at gmail.com
Wed Jan 11 04:00:42 CET 2006


G'day,

currently cheap does not check whether pic input items are members of
punctuation-characters.  This means that we currently can't parse any
sentence with a full stop (^_^).  Of course, we can take it out of the
pic, which is what we are doing, put it would be nice if the new
preprocessor code handled this within cheap, particularly for MAF
where we may use the same input for multiple systems.

Our current settings are:

JACY punctuation-characters. found in pet/japanese.set.
punctuation-characters := "\"!&'()*+,-−./;<=>?@[\]^_`{|}~。?…., ○●◎*".

Note that punctuation-characters are defined separately for the LKB
(in lkb/globals.lsp):
(defparameter *punctuation-characters*
  (append
   '(#\space #\! #\" #\& #\' #\(
     #\) #\* #\+ #\, #\- #\. #\/ #\;
     #\< #\= #\> #\? #\@ #\[ #\\ #\] #\^
     #\_ #\` #\{ #\| #\} #\~)
   #+:ics
   '(#\ideographic_full_stop #\fullwidth_question_mark
     #\horizontal_ellipsis #\fullwidth_full_stop
     #\fullwidth_exclamation_mark #\black_circle
     #\fullwidth_comma #\ideographic_space
     #\katakana_middle_dot #\white_circle)))

We occasionally get them out of sync (^_^).

--
Francis Bond  <www.kecl.ntt.co.jp/icl/mtg/members/bond/>
NTT Communication Science Laboratories | Machine Translation Research Group



More information about the developers mailing list