[developers] Arabic transliteration

Nurit Melnik nuritme at openu.ac.il
Sun Jan 18 08:12:56 CET 2015


Thanks, Berthold.
It looks like we'll be home-growing an LKB-compatible transliteration schema.

Nurit

-----Original Message-----
From: Berthold Crysmann [mailto:berthold.crysmann at gmail.com] 
Sent: Thursday, January 15, 2015 7:35 PM
To: Nurit Melnik; 'developers at delph-in.net'
Subject: Re: [developers] Arabic transliteration

Hi Nurit,

one solution is to write special tokeniser rules that map from transliteration to unicode or else from Buckwalter's transliteration to some home-grown transliteration that does not use caps. Look at the rpp files in the ERG (if they are still around). I remember that in the passed Dan used to map I to _I and did similar things to the two letter state abbreviations.

The only real problem with conversion to unicode is the BiDi stuff;-( That could probably be done with token mapping rules in pet/ace (there's recursion), but for LKB things look less cheerful.

Hope that helps.

On 14/01/15 16:04, Nurit Melnik wrote:
> Sorry to harp on this, but I have one follow-up question - is there a way to get LKB to be case sensitive?
>
> Thanks!
> Nurit
>
> -----Original Message-----
> From: Nurit Melnik
> Sent: Friday, January 09, 2015 11:50 AM
> To: 'Francis Bond'
> Cc: developers at delph-in.net
> Subject: RE: [developers] Arabic transliteration
>
> Yes, not all of the people working on the grammar know how to read Arabic script.
Wow. I always thought learning grammar and vocabulary were the real issues...

Cheers,

B
> For the rest , typing in English is much easier.
>
> -----Original Message-----
> From: fcbond at gmail.com [mailto:fcbond at gmail.com] On Behalf Of Francis 
> Bond
> Sent: Friday, January 09, 2015 2:57 AM
> To: Nurit Melnik
> Cc: developers at delph-in.net
> Subject: Re: [developers] Arabic transliteration
>
> G'day,
>
>> I'm about to start working on an Arabic grammar and I have a 
>> transliteration problem.
>>
>> The most common Arabic transliteration schema is Buckwalter's 
>> (http://www.qamus.org/transliteration.htm), which is case sensitive 
>> and uses some special characters (|,>, <, %, }, {, *,', `).
>>
>> Do you know if there's a way to use this transliteration with the LKB?
>>
>> If not – any suggestions for a different schema?
> Is there any reason not just to use unicode (utf8) for the characters and write them in Arabic directly?  I think this works pretty much everywhere it should (although I have not tested it with composing characters).
>
> --
> Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
> Division of Linguistics and Multilingual Studies Nanyang Technological 
> University
>


--
Berthold Crysmann <crysmann at linguist.jussieu.fr> CNRS, Laboratoire de linguistique formelle (UMR 7110), U Paris Diderot Case 7031, 5 rue Thomas Mann, 75205 Paris cedex 13 Bureau 545, bâtiment Olympe de Gouges, rue Albert Einstein, 75013 Paris




More information about the developers mailing list