[developers] Arabic transliteration

Berthold Crysmann berthold.crysmann at gmail.com
Thu Jan 15 18:34:33 CET 2015


Hi Nurit,

one solution is to write special tokeniser rules that map from 
transliteration to unicode or else from Buckwalter's transliteration to 
some home-grown transliteration that does not use caps. Look at the rpp 
files in the ERG (if they are still around). I remember that in the 
passed Dan used to map I to _I and did similar things to the two letter 
state abbreviations.

The only real problem with conversion to unicode is the BiDi stuff;-( 
That could probably be done with token mapping rules in pet/ace (there's 
recursion), but for LKB things look less cheerful.

Hope that helps.

On 14/01/15 16:04, Nurit Melnik wrote:
> Sorry to harp on this, but I have one follow-up question - is there a way to get LKB to be case sensitive?
>
> Thanks!
> Nurit
>
> -----Original Message-----
> From: Nurit Melnik
> Sent: Friday, January 09, 2015 11:50 AM
> To: 'Francis Bond'
> Cc: developers at delph-in.net
> Subject: RE: [developers] Arabic transliteration
>
> Yes, not all of the people working on the grammar know how to read Arabic script.
Wow. I always thought learning grammar and vocabulary were the real 
issues...

Cheers,

B
> For the rest , typing in English is much easier.
>
> -----Original Message-----
> From: fcbond at gmail.com [mailto:fcbond at gmail.com] On Behalf Of Francis Bond
> Sent: Friday, January 09, 2015 2:57 AM
> To: Nurit Melnik
> Cc: developers at delph-in.net
> Subject: Re: [developers] Arabic transliteration
>
> G'day,
>
>> I'm about to start working on an Arabic grammar and I have a
>> transliteration problem.
>>
>> The most common Arabic transliteration schema is Buckwalter's
>> (http://www.qamus.org/transliteration.htm), which is case sensitive
>> and uses some special characters (|,>, <, %, }, {, *,', `).
>>
>> Do you know if there's a way to use this transliteration with the LKB?
>>
>> If not – any suggestions for a different schema?
> Is there any reason not just to use unicode (utf8) for the characters and write them in Arabic directly?  I think this works pretty much everywhere it should (although I have not tested it with composing characters).
>
> --
> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
> Division of Linguistics and Multilingual Studies Nanyang Technological University
>


-- 
Berthold Crysmann <crysmann at linguist.jussieu.fr>
CNRS, Laboratoire de linguistique formelle (UMR 7110), U Paris Diderot
Case 7031, 5 rue Thomas Mann, 75205 Paris cedex 13
Bureau 545, bâtiment Olympe de Gouges, rue Albert Einstein, 75013 Paris



More information about the developers mailing list