[developers] no utf-8 in parse browser.
Francis Bond
bond at ieee.org
Sat Jan 23 12:10:50 CET 2010
G'day,
2010/1/23 C.J. Adams-Collier <cjac at colliertech.org>:
> Aha... I think it might be the combining characters I use...
I also feared that that may be the case.
> x̣ is x plus ̣
>
> I'm thinking that the lkb doesn't like U+0232, because:
>
> (lkb::do-parse-tty "x̣ʷil̕ ti č̓ač̓as")
>
> No analysis found corresponding to token 0-1 X
> No analysis found corresponding to token 1-2 IL
> No analysis found corresponding to token 2-3 TI
> No analysis found corresponding to token 3-4 Č
> No analysis found corresponding to token 4-5 AČ
> No analysis found corresponding to token 5-6 AS
> No parses found
>
> it doesn't mention anything about X̣ or x̣
>
> Any idea how we can fix this? Or perhaps I'm still wrong ;)
Can you trace do-parse-tty and see where the input is getting broken
up incorrectly? It could be that there is a check for character type
somewhere that is being overstrict.
--
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
More information about the developers
mailing list