[developers] Divergence between LKB and Pet morphology
crysmann at ifk.uni-bonn.de
Wed May 27 13:33:27 CEST 2009
I have spotted some difference in functionality and (space) efficiency
between the LKB morphology and its port in Pet. I am referring to the
version of cheap that is distributed with LOGON.
My nascent grammar of Hausa has a fair bunch of morphological rules:
besides standard suffixation, there is also a good deal of root
consonant reduplication going on (e.g. ƙofa - ƙofofi). So far, the LKB
string unification can handle this quite well.
Quite a number of these rules drop the final vowel and replace it with
the initial vowel of the suffix. In the example above, the final a is
dropped. The current LKB provides wildcards to match against classes of
characters on the LHS without carrying them over to the RHS. The
spelling part of the above-mentioned rule looks like this:
;;; Letter sets
%(letter-set (!c bcdfghjklmnpqrstvwxyzɓɗƙ\'))
%(wild-card (?c bcdfghjklmnpqrstvwxyzɓɗƙ\'))
%(wild-card (?v aeiou))
%suffix (!c?v !co!ci) (t?v toci) (s?v soshi) (w?v woyi) (ts?v
Pet currently does not seem to support wild cards (only letter-sets).
As a remedy, I had a perl script to unfold the wild-cards into all
possible letters which works, sort of.
However, when I load the grammar with cheap, loading takes more than a
minute. The final process is 2.6 G in size (resident). The LKB process,
with the grammar loaded, only uses 390 M (resident) with a virtual size
of 1.8G (which I assume does not really matter).
Obviously, there is something wrong with Pet.
Could someone have a look at this? I can provide a grammar for testing.
The wild-card functionality would be something good to have in Pet, not
only for Hausa. In GG, wild-cards would greatly facilitate the treatment
of Umlaut, which, to be honest is quite clunky at the moment.
More information about the developers