[developers] Reduplication in morpholgy
berthold.crysmann at gmail.com
Mon Sep 14 02:00:27 CEST 2015
Hi Francis, Emily, and who else might be listening on this,
I have now got a version of my revised analysis working (thanks also to
Woodley for fixing a critical issue in ace for me).
Here's a brief description (for more details look at the HaG code
$LOGONROOT/llf/hag, or ask me):
Hausa has some 4 inflectional classes that use total reduplication.
While tonal patterns may differ between base and reduplicant, the
reduplicant does not undergo inflection different from the base. Thus,
on the string level (once we have dealt with suprasegmental markings),
the reduplicant is faithful to the base (lucky for me). We do get
further inflectional marking on the base, however.
E.g. in plural class 12, we get
nâs nâs `nurses'
sìkêt sìkêt `skirts'
jōjì jōjì `judges'
The latter can be morphologically possessed, yielding "jōjì jōjiìnmù"
So in this class, the reduplicant is just a copy of the base lexeme's
orth value (+ tone + length).
A more interesting case are augmentative adjectives. Here the base for
the productive formation is partially reduplicated in the masc and fem
singular, but one drops the partial reduplication in the plural, to have
total reduplication of a clipped base instead.
E.g. class 14:
mālàm bundumēmḕ -> mā̀làmai bundumā bùndùmā̀
Which can also undergo type raising, giving
bundumā bùndùmàn mā̀làmai
Note that the inflection -n on the base does not get copied to the
As for the analysis, here is what I now do (in ace):
As a first step, I trigger an ersatz, in token mapping, on every item in
the chart that is followed by some other item.
The surface string I substitute is __REDUP__, which corresponds to
exactly one lexical item (redup_n).
I tried filtering already on this first step on partial identity of form
(with regexps), but that did not seem to work. The ersatzing records the
original string in +CARG. The rule applies after processing of
suprasegmentals (tone and length), and copies that information over as
well. See tmr/redup.tdl.
In a second step, I shall use lexical filtering, which applies after
lookup and morphological processing, to get rid of unlicensed
reduplication entries in the chart. I'll probably implement that
beginning of next week.
During morphological processing, I introduce constraints for the
reduplicant as part of the morphological rules applying to the base. In
particular, this enables me to select precisely at which step in the
derivation I want to memoise the identity of the base. All the
constraints on the reduplicant are collected in MORPH.MCLASS.--REDUP.
See rule n_pl12_lr (irules.tdl), or, even better, n_pl14_lr.
In syntax, I use a binary rule to combine the reduplicant ersatz with
the base and impose all the constraints the base has for the shape of
the reduplicant (rule n-pl-reduplication in rules.tdl).
Different plural formation patterns require different levels of
identity: e.g. class 12 copies the segments and suprasegmentals of the
lexeme, whereas class 14 copies the segments of the derived plural and
imposes a fixed H+ pattern on the reduplicant, and a fixed L+ tone
pattern on the base. Either of them exempts further inflectional
markings from reduplication.
This is all done by the morphological rules applying to the base,
which also store the constraints re the reduplicant in MORPH.--REDUP,
having a value for orthography (--STEM string) and suprasegmentals
(--SUPRA supra). See the rules for class 12 and class 14 in irules.tdl.
The binary reduplication rule then imposes these constraints on the
Some sample sentences for you to test:
joji jojinmu sun zo
joji jojin sun zo
malamai bunduma bunduma sun zo
In the hag directory, you can test with
ace -l -g ace/hausa.dat
ace -T -g ace/hausa.dat|ace -e -l --show-gen-chart -g ace/hausa.g.dat
Using a single generic entry for all reduplicants, I can now trigger the
one I need based on very general properties, i.e. plural in Hausa. Look
at reduplication.mtr for reference. Since there's only one rule left
now, it is soon going to move into the main trigger.mtr.
Using an ersatz, I have to replace the generic __REDUP__ phonology with
something sensible at some point: I use post-generation chart mapping to
copy over the relevant string from the base. The relevant rule is the
third or fourth one down in tmr/post-generation.tdl. Likewise, I copy
over the constraints on tone and vowel length (SUPRA) as well, which are
imposed in the --REDUP.--SUPRA features.
To summarise, the new solution scales up to open classes, and it is
fully supported in ace. As for the LKB, you can test with the ersatz as
input, e.g. "__redup__ joji sun zo" parses and generates. Generation
will only use the ersatz. I shall look into refining my user function to
get proper generation output.
For the moment, I kept the old analysis alive for testing in the LKB: if
you type in "nas nas sun zo", you should still get an analysis, and you
can even generate from it (using the ersatz). In ace, this "native"
analysis is disabled by requiring an empty RELS on the reduplicant (as a
type addendum on the phrasal type), which will filter out the native
contentful entry, and keep the semantically empty one (the ersatz), thus
avoiding any spurious ambiguity. Since the LKB analyses are a superset
of the ace analyses, there will not be any issue w.r.t. treebanking.
As I feel about it, total reduplication is the killer argument for
having chart mapping in the LKB, or else for having a complete ace-based
4. Outlook: Chinese and Indonesian
As far as I can tell, my approach for Hausa should be straightforward to
port to these two languages. There is the question of the X: you would
probably treat that as a morphophonological effect of some rule
application (I guess). So, given our technologies, the most
straightforward thing to do is place all morphophonological changes on
the base. That leaves you free to impose simple total identity on the
reduplicant: just choose the right moment in the derivation (thanks to
Woodley who has just fixed the recording of orth values in ace an hour
or two ago). If that should not be viable (there are languages like
that), one needs to replicate some of the morphology in token mapping.
For reference, look at how the ERG deals with plurals of unknown words.
All the best,
On 08/09/15 16:26, Berthold Crysmann wrote:
> Hi Francis,
> I guess you only worry about total reduplication. While in principle
> in Chinese you could get away with using string unification (thanks to
> the script, word length is limited), but specifying the letter set
> will not be much for for either humans or machines....
> I am currently working on total reduplication in my Hausa grammar. I
> had a first solution that is non-compositional in the semantics. I.e.
> I just use a binary rule that glues the stuff together, conditioned on
> identity of predicates, but throws away the semantic contribution in
> the reduplicant. Works in parsing, but needs *item-specific* trigger
> rules in generation.
> Right now, I am exploring with ersatzing. Things should work well as
> long as only one of the reduplicants undergoes additional morphology.
> Otherwise, you'll have to memoise parts of the original string, so you
> can apply regular morphophonological changes to the reduplicant. Seems
> to work in the LKB.
> I shall commit that new analysis very soon. I shall also send you a
> detailed description.
> On 08/09/15 15:28, Francis Bond wrote:
>> we are working with a couple of languages where we would like to be
>> able to write lexical rules that do things like:
>> Take a two character word in Chinese (AB)
>> AB -> ABAB
>> AB -> AABB
>> AB -> AAB
>> AB -> ABXAB (where X is fixed by the rule)
>> AB -> AXAB
>> AB -> AXB
>> Take a one character word (A):
>> A -> AA
>> A -> AXA
>> A -> AAX
>> In Indonesian we want to take an arbitrary word and produce a duplicate
>> w -> w-w (kasus -> kasus-kasus)
>> More examples here:
>> Is this (or some of this) possible with the DELPH-IN tools? If so,
>> can someone explain how to do it (or point to a paper or website that
>> tells us how to do it)?
>> Thanks in advance,
Berthold Crysmann<crysmann at linguist.jussieu.fr>
CNRS, Laboratoire de linguistique formelle (UMR 7110), U Paris Diderot
Case 7031, 5 rue Thomas Mann, 75205 Paris cedex 13
Bureau 545, bâtiment Olympe de Gouges, rue Albert Einstein, 75013 Paris
More information about the developers