[developers] Divergence between LKB and Pet morphology

Wed May 27 13:33:27 CEST 2009

Dear all, 

I have spotted some difference in functionality and (space) efficiency
between the LKB morphology and its port in Pet. I am referring to the
version of cheap that is distributed with LOGON. 

My nascent grammar of Hausa has a fair bunch of morphological rules:
besides standard suffixation, there is also a good deal of root
consonant reduplication going on (e.g. ƙofa - ƙofofi). So far, the LKB
string unification can handle this quite well. 

Quite a number of these rules drop the final vowel and replace it with
the initial vowel of the suffix. In the example above, the final a is
dropped. The current LKB provides wildcards to match against classes of
characters on the LHS without carrying them over to the RHS. The
spelling part of the above-mentioned rule looks like this: 

;;; Letter sets

%(letter-set (!c bcdfghjklmnpqrstvwxyzɓɗƙ\'))

%(wild-card (?c bcdfghjklmnpqrstvwxyzɓɗƙ\'))
%(wild-card (?v aeiou))

noun_pl1_vow_ir :=
%suffix (!c?v !co!ci) (t?v toci) (s?v soshi) (w?v woyi) (ts?v 
tsotsi)
	...

Pet currently does not seem to support wild cards (only letter-sets). 

As a remedy, I had a perl script to unfold the wild-cards into all
possible letters which works, sort of.

However, when I load the grammar with cheap, loading takes more than a
minute. The final process is 2.6 G in size (resident). The LKB process,
with the grammar loaded, only uses 390 M (resident) with a virtual size
of 1.8G (which I assume does not really matter). 

Obviously, there is something wrong with Pet. 

Could someone have a look at this? I can provide a grammar for testing. 

Berthold

PS: 

The wild-card functionality would be something good to have in Pet, not
only for Hausa. In GG, wild-cards would greatly facilitate the treatment
of Umlaut, which, to be honest is quite clunky at the moment.