[Fwd: Re: [developers] Re: still problem with accents...]
Stephan Oepen
oe at csli.Stanford.EDU
Mon Jun 20 15:12:19 CEST 2005
hi again, montse, and thanks for the grammar!
can you test two more things for me, please? when you run the LKB now,
please evaluate
excl:*locale*
in the *common-lisp* buffer and email me the output. ideally, i would
like to see the entire *common-lisp* buffer, so as to know exactly what
has happened since start-up. as your grammar is coded ISO-9958-1 (aka
Latin-1 or Western European), i believe the correct setting should be:
LKB(8): excl:*locale*
#<locale "es_ES" [:LATIN1-BASE] @ #x448563ea>
secondly, i can load and run your grammar, once i make sure my encoding
is consistenly ISO-8859-1. i attach an extended `dot.emacs', and you
could please try the following:
- save the attachment somewhere, say `/tmp';
- M-x load-file RET /tmp/dot.emacs RET
- M-x spanish RET
and then load your grammar. this works for me, insofar as i can now
(do-parse-tty "el niño lloró")
this yields no unknown word complaints, but unfortunately no parses.
inspecting `Parse | Show parse chart', i see all three words in the
chart, accented characters display properly, and there are lexical
entries associated to them as expected (two for `el', one for `niño'
and `lloró', respectively). however, i fail to build the NP, as it
seems `niño' is not going through `masc-sing-nom_infl_rule'. after
some poking around, i worked out that the line
inf-verb_infl_rule := inf-infl-rule.
was confusing the reader for %suffix annotations, and once i change it
to
inf-verb_infl_rule :=
inf-infl-rule.
i can now parse fine. i think i will include the `dot.emacs' changes
in the next LKB build, so feel free to just drop my version on top of
the one the installer put into $DELPHINHOME/lkb/etc/. also, note that
you will have to use do-parse-tty() (or other LKB :tty mode functions)
whenever you need to input accented characters; we will have to report
the problems with the `Parse | Parse input' dialogue to the vendor of
the graphics toolkit we use for the LKB.
a few more remarks. i noticed that loading your grammar is slow with
the ready-to-run LKB binaries that we distribute. you can make things
faster by requesting more memory at start-up, so that the LKB need not
grow to a suitable size while loading the grammar. put the following
(system:resize-areas :old (* 32 1024 1024) :new (* 128 1024 1024))
into a file `~/.lkbrc' in your home directory to request a bigger LKB
memory footprint at start-up.
also, to avoid potential for confusion regarding encodings, consider
adding lines like the following
;;; Hey, emacs(1), this is -*- mode: tdl; encoding: iso-8859-1 -*-
to all files of your grammar.
finally, a heads-up: ann has lately rewritten the morphology code from
scratch, and that silly problem related to formatting i noticed above
has been eliminated. however, i also noted some issues when processing
your grammar using the newer code. i would like to forward the grammar
and some notes on my findings to ann and the `developers' list. do you
agree to my doing that?
all the best - oe
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (ILN); Boks 1102 Blindern; 0317 Oslo; (+47) 2285 7989
+++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++ --- oe at csli.stanford.edu; oe at hf.uio.no; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dot.emacs
Type: application/octet-stream
Size: 7285 bytes
Desc: not available
URL: <http://lists.delph-in.net/archives/developers/attachments/20050620/06355447/attachment.obj>
More information about the developers
mailing list