[itsdb] Bug with i-length and chinese characters.

Stefan Müller Stefan.Mueller at cl.uni-bremen.de
Wed May 23 09:52:52 CEST 2007


Hi,

I have something that looks like a bug to me, but maybe it is just
natural: Looking at the output of the performance comparison I saw that
the performance dropped in the earea of sentence with over 25 words. I
was surprised that I had such long sentences in my test suite. A request
to the database revealed that [incr TSDB++] thinks that the following
sentence has 27 words:

那     辆     张三       买   的  车    锈。

Is this due to unicode encoding/decoding?

Thanks and greetings

	Stefan

-- 
Stefan Müller

Universität Bremen/Fachbereich 10      Tel: (+49) (+421) 218-8601
Postfach 33 04 40
D-28334 Bremen

http://www.cl.uni-bremen.de/~stefan/

http://www.cl.uni-bremen.de/~stefan/Babel/Interaktiv/





More information about the itsdb mailing list