[itsdb] Bug with i-length and chinese characters.
Stefan Müller
Stefan.Mueller at cl.uni-bremen.de
Wed May 23 09:52:52 CEST 2007
Hi,
I have something that looks like a bug to me, but maybe it is just
natural: Looking at the output of the performance comparison I saw that
the performance dropped in the earea of sentence with over 25 words. I
was surprised that I had such long sentences in my test suite. A request
to the database revealed that [incr TSDB++] thinks that the following
sentence has 27 words:
那 辆 张三 买 的 车 锈。
Is this due to unicode encoding/decoding?
Thanks and greetings
Stefan
--
Stefan Müller
Universität Bremen/Fachbereich 10 Tel: (+49) (+421) 218-8601
Postfach 33 04 40
D-28334 Bremen
http://www.cl.uni-bremen.de/~stefan/
http://www.cl.uni-bremen.de/~stefan/Babel/Interaktiv/
More information about the itsdb
mailing list