[developers] Arabic transliteration

Nurit Melnik nuritme at openu.ac.il
Sun Jan 18 08:12:56 CET 2015

Thanks, Berthold.
It looks like we'll be home-growing an LKB-compatible transliteration schema.


-----Original Message-----
From: Berthold Crysmann [mailto:berthold.crysmann at gmail.com] 
Sent: Thursday, January 15, 2015 7:35 PM
To: Nurit Melnik; 'developers at delph-in.net'
Subject: Re: [developers] Arabic transliteration

Hi Nurit,

one solution is to write special tokeniser rules that map from transliteration to unicode or else from Buckwalter's transliteration to some home-grown transliteration that does not use caps. Look at the rpp files in the ERG (if they are still around). I remember that in the passed Dan used to map I to _I and did similar things to the two letter state abbreviations.

The only real problem with conversion to unicode is the BiDi stuff;-( That could probably be done with token mapping rules in pet/ace (there's recursion), but for LKB things look less cheerful.

Hope that helps.

On 14/01/15 16:04, Nurit Melnik wrote:
> Sorry to harp on this, but I have one follow-up question - is there a way to get LKB to be case sensitive?
> Thanks!
> Nurit
> -----Original Message-----
> From: Nurit Melnik
> Sent: Friday, January 09, 2015 11:50 AM
> To: 'Francis Bond'
> Cc: developers at delph-in.net
> Subject: RE: [developers] Arabic transliteration
> Yes, not all of the people working on the grammar know how to read Arabic script.
Wow. I always thought learning grammar and vocabulary were the real issues...


> For the rest , typing in English is much easier.
> -----Original Message-----
> From: fcbond at gmail.com [mailto:fcbond at gmail.com] On Behalf Of Francis 
> Bond
> Sent: Friday, January 09, 2015 2:57 AM
> To: Nurit Melnik
> Cc: developers at delph-in.net
> Subject: Re: [developers] Arabic transliteration
> G'day,
>> I'm about to start working on an Arabic grammar and I have a 
>> transliteration problem.
>> The most common Arabic transliteration schema is Buckwalter's 
>> (http://www.qamus.org/transliteration.htm), which is case sensitive 
>> and uses some special characters (|,>, <, %, }, {, *,', `).
>> Do you know if there's a way to use this transliteration with the LKB?
>> If not – any suggestions for a different schema?
> Is there any reason not just to use unicode (utf8) for the characters and write them in Arabic directly?  I think this works pretty much everywhere it should (although I have not tested it with composing characters).
> --
> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
> Division of Linguistics and Multilingual Studies Nanyang Technological 
> University

Berthold Crysmann <crysmann at linguist.jussieu.fr> CNRS, Laboratoire de linguistique formelle (UMR 7110), U Paris Diderot Case 7031, 5 rue Thomas Mann, 75205 Paris cedex 13 Bureau 545, bâtiment Olympe de Gouges, rue Albert Einstein, 75013 Paris

More information about the developers mailing list