[Fwd: Re: [developers] Re: still problem with accents...]

Stephan Oepen oe at csli.Stanford.EDU
Mon Jun 20 15:12:19 CEST 2005


hi again, montse, and thanks for the grammar!

can you test two more things for me, please?  when you run the LKB now,
please evaluate

  excl:*locale*

in the *common-lisp* buffer and email me the output.  ideally, i would
like to see the entire *common-lisp* buffer, so as to know exactly what
has happened since start-up.  as your grammar is coded ISO-9958-1 (aka
Latin-1 or Western European), i believe the correct setting should be:

  LKB(8): excl:*locale*
  #<locale "es_ES" [:LATIN1-BASE] @ #x448563ea>

secondly, i can load and run your grammar, once i make sure my encoding
is consistenly ISO-8859-1.  i attach an extended `dot.emacs', and you
could please try the following:

  - save the attachment somewhere, say `/tmp';
  - M-x load-file RET /tmp/dot.emacs RET
  - M-x spanish RET

and then load your grammar.  this works for me, insofar as i can now

  (do-parse-tty "el niño lloró")

this yields no unknown word complaints, but unfortunately no parses.
inspecting `Parse | Show parse chart', i see all three words in the
chart, accented characters display properly, and there are lexical
entries associated to them as expected (two for `el', one for `niño'
and `lloró', respectively).  however, i fail to build the NP, as it
seems `niño' is not going through `masc-sing-nom_infl_rule'.  after
some poking around, i worked out that the line

  inf-verb_infl_rule := inf-infl-rule.

was confusing the reader for %suffix annotations, and once i change it
to 

  inf-verb_infl_rule := 
  inf-infl-rule.

i can now parse fine.  i think i will include the `dot.emacs' changes
in the next LKB build, so feel free to just drop my version on top of
the one the installer put into $DELPHINHOME/lkb/etc/.  also, note that
you will have to use do-parse-tty() (or other LKB :tty mode functions)
whenever you need to input accented characters; we will have to report
the problems with the `Parse | Parse input' dialogue to the vendor of
the graphics toolkit we use for the LKB.

a few more remarks.  i noticed that loading your grammar is slow with
the ready-to-run LKB binaries that we distribute.  you can make things
faster by requesting more memory at start-up, so that the LKB need not
grow to a suitable size while loading the grammar.  put the following

  (system:resize-areas :old (* 32 1024 1024) :new (* 128 1024 1024))

into a file `~/.lkbrc' in your home directory to request a bigger LKB
memory footprint at start-up.

also, to avoid potential for confusion regarding encodings, consider
adding lines like the following

  ;;; Hey, emacs(1), this is -*- mode: tdl; encoding: iso-8859-1 -*-

to all files of your grammar.

finally, a heads-up: ann has lately rewritten the morphology code from
scratch, and that silly problem related to formatting i noticed above
has been eliminated.  however, i also noted some issues when processing
your grammar using the newer code.  i would like to forward the grammar
and some notes on my findings to ann and the `developers' list.  do you
agree to my doing that?

                                                  all the best  -  oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (ILN); Boks 1102 Blindern; 0317 Oslo; (+47) 2285 7989
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at csli.stanford.edu; oe at hf.uio.no; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dot.emacs
Type: application/octet-stream
Size: 7285 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20050620/06355447/attachment.obj>


More information about the developers mailing list