[itsdb] Bug with i-length and chinese characters.
    Stefan Müller 
    Stefan.Mueller at cl.uni-bremen.de
       
    Wed May 23 09:52:52 CEST 2007
    
    
  
Hi,
I have something that looks like a bug to me, but maybe it is just
natural: Looking at the output of the performance comparison I saw that
the performance dropped in the earea of sentence with over 25 words. I
was surprised that I had such long sentences in my test suite. A request
to the database revealed that [incr TSDB++] thinks that the following
sentence has 27 words:
那     辆     张三       买   的  车    锈。
Is this due to unicode encoding/decoding?
Thanks and greetings
	Stefan
-- 
Stefan Müller
Universität Bremen/Fachbereich 10      Tel: (+49) (+421) 218-8601
Postfach 33 04 40
D-28334 Bremen
http://www.cl.uni-bremen.de/~stefan/
http://www.cl.uni-bremen.de/~stefan/Babel/Interaktiv/
    
    
More information about the itsdb
mailing list