[developers] Arabic transliteration
Berthold Crysmann
berthold.crysmann at gmail.com
Thu Jan 15 18:34:33 CET 2015
Hi Nurit,
one solution is to write special tokeniser rules that map from
transliteration to unicode or else from Buckwalter's transliteration to
some home-grown transliteration that does not use caps. Look at the rpp
files in the ERG (if they are still around). I remember that in the
passed Dan used to map I to _I and did similar things to the two letter
state abbreviations.
The only real problem with conversion to unicode is the BiDi stuff;-(
That could probably be done with token mapping rules in pet/ace (there's
recursion), but for LKB things look less cheerful.
Hope that helps.
On 14/01/15 16:04, Nurit Melnik wrote:
> Sorry to harp on this, but I have one follow-up question - is there a way to get LKB to be case sensitive?
>
> Thanks!
> Nurit
>
> -----Original Message-----
> From: Nurit Melnik
> Sent: Friday, January 09, 2015 11:50 AM
> To: 'Francis Bond'
> Cc: developers at delph-in.net
> Subject: RE: [developers] Arabic transliteration
>
> Yes, not all of the people working on the grammar know how to read Arabic script.
Wow. I always thought learning grammar and vocabulary were the real
issues...
Cheers,
B
> For the rest , typing in English is much easier.
>
> -----Original Message-----
> From: fcbond at gmail.com [mailto:fcbond at gmail.com] On Behalf Of Francis Bond
> Sent: Friday, January 09, 2015 2:57 AM
> To: Nurit Melnik
> Cc: developers at delph-in.net
> Subject: Re: [developers] Arabic transliteration
>
> G'day,
>
>> I'm about to start working on an Arabic grammar and I have a
>> transliteration problem.
>>
>> The most common Arabic transliteration schema is Buckwalter's
>> (http://www.qamus.org/transliteration.htm), which is case sensitive
>> and uses some special characters (|,>, <, %, }, {, *,', `).
>>
>> Do you know if there's a way to use this transliteration with the LKB?
>>
>> If not – any suggestions for a different schema?
> Is there any reason not just to use unicode (utf8) for the characters and write them in Arabic directly? I think this works pretty much everywhere it should (although I have not tested it with composing characters).
>
> --
> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
> Division of Linguistics and Multilingual Studies Nanyang Technological University
>
--
Berthold Crysmann <crysmann at linguist.jussieu.fr>
CNRS, Laboratoire de linguistique formelle (UMR 7110), U Paris Diderot
Case 7031, 5 rue Thomas Mann, 75205 Paris cedex 13
Bureau 545, bâtiment Olympe de Gouges, rue Albert Einstein, 75013 Paris
More information about the developers
mailing list