[developers] no utf-8 in parse browser.

Francis Bond bond at ieee.org
Sat Jan 23 12:10:50 CET 2010


G'day,

2010/1/23 C.J. Adams-Collier <cjac at colliertech.org>:
> Aha... I think it might be the combining characters I use...

I also feared that that may be the case.

> x̣ is x plus ̣
>
> I'm thinking that the lkb doesn't like U+0232, because:
>
> (lkb::do-parse-tty "x̣ʷil̕ ti č̓ač̓as")
>
> No analysis found corresponding to token 0-1 X
> No analysis found corresponding to token 1-2 IL
> No analysis found corresponding to token 2-3 TI
> No analysis found corresponding to token 3-4 Č
> No analysis found corresponding to token 4-5 AČ
> No analysis found corresponding to token 5-6 AS
> No parses found
>
> it doesn't mention anything about X̣ or x̣
>
> Any idea how we can fix this?  Or perhaps I'm still wrong ;)

Can you trace do-parse-tty and see where the input is getting broken
up incorrectly?  It could be that there is a check for character type
somewhere that is being overstrict.



-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University




More information about the developers mailing list