[developers] Reduplication in morpholgy

Mon Sep 14 02:00:27 CEST 2015

Hi Francis, Emily, and who else might be listening on this,

I have now got a version of my revised analysis working (thanks also to 
Woodley for fixing a critical issue in ace for me).

Here's a brief description (for more details look at the HaG code 
$LOGONROOT/llf/hag, or ask me):

Hausa has some 4 inflectional classes that use total reduplication. 
While tonal patterns may differ between base and reduplicant, the 
reduplicant does not undergo inflection different from the base. Thus, 
on the string level (once we have dealt with suprasegmental markings), 
the reduplicant is faithful to the base (lucky for me). We do get 
further inflectional marking on the base, however.

E.g. in plural class 12, we get

nâs nâs `nurses'
sìkêt sìkêt `skirts'
jōjì jōjì `judges'

The latter can be morphologically possessed, yielding "jōjì jōjiìnmù" 
(our judges).

So in this class, the reduplicant is just a copy of the base lexeme's 
orth value (+ tone + length).

A more interesting case are augmentative adjectives. Here the base for 
the productive formation is partially reduplicated in the masc and fem 
singular, but one drops the partial reduplication in the plural, to have 
total reduplication of a clipped base instead.

E.g. class 14:

mālàm bundumēmḕ ->  mā̀làmai bundumā bùndùmā̀

Which can also undergo type raising, giving

bundumā bùndùmàn mā̀làmai

Note that the inflection -n on the base does not get copied to the 
reduplicant.

As for the analysis, here is what I now do (in ace):

1. Parsing

As a first step, I trigger an ersatz, in token mapping, on every item in 
the chart that is followed by some other item.
The surface string I substitute is __REDUP__, which corresponds to 
exactly one lexical item (redup_n).
I tried filtering already on this first step on partial identity of form 
(with regexps), but that did not seem to work. The ersatzing records the 
original string in +CARG. The rule applies after processing of 
suprasegmentals (tone and length), and copies that information over as 
well. See tmr/redup.tdl.

In a second step, I shall use lexical filtering, which applies after 
lookup and morphological processing, to get rid of unlicensed 
reduplication entries in the chart. I'll probably implement that 
beginning of next week.

During morphological processing, I introduce constraints for the 
reduplicant  as part of the morphological rules applying to the base. In 
particular, this enables me to select precisely at which step in the 
derivation I want to memoise the identity of the base. All the 
constraints on the reduplicant are collected in MORPH.MCLASS.--REDUP. 
See rule n_pl12_lr (irules.tdl), or, even better, n_pl14_lr.

In syntax, I use a binary rule to combine the reduplicant ersatz with 
the base and impose all the constraints the base has for the shape of 
the reduplicant (rule n-pl-reduplication in rules.tdl).

Different plural formation patterns require different levels of 
identity: e.g. class 12 copies the segments and suprasegmentals of the 
lexeme, whereas class 14 copies the segments of the derived plural and 
imposes a fixed H+ pattern on the reduplicant, and a fixed L+ tone 
pattern on the base. Either of them exempts further inflectional 
markings from reduplication.

This is all done by the  morphological rules applying to  the base, 
which also store the constraints re the reduplicant in MORPH.--REDUP, 
having a value for orthography (--STEM string) and suprasegmentals 
(--SUPRA supra). See the rules for class 12 and class 14 in irules.tdl. 
The binary reduplication rule then imposes these constraints on the 
reduplicant.

Some sample sentences for you to test:

joji jojinmu sun zo
joji jojin sun zo
malamai bunduma bunduma sun zo

In the hag directory, you can test with

ace -l -g ace/hausa.dat

for parsing

or with

ace -T -g ace/hausa.dat|ace -e -l --show-gen-chart -g ace/hausa.g.dat

for generation.

2. Generation

Using a single generic entry for all reduplicants, I can now trigger the 
one I need based on very general properties, i.e. plural in Hausa. Look 
at reduplication.mtr for reference. Since there's only one rule left 
now, it is soon going to move into the main trigger.mtr.

Using an ersatz, I have to replace the generic __REDUP__ phonology with 
something sensible at some point: I use post-generation chart mapping to 
copy over the relevant string from the base. The relevant rule is the 
third or fourth one down in tmr/post-generation.tdl. Likewise, I copy 
over the constraints on tone and vowel length (SUPRA) as well, which are 
imposed in the --REDUP.--SUPRA features.

3. Conclusion

To summarise, the new solution scales up to open classes, and it is 
fully supported in ace. As for the LKB, you can test with the ersatz as 
input, e.g. "__redup__ joji sun zo" parses and generates. Generation 
will only use the ersatz. I shall look into refining my user function to 
get proper generation output.

For the moment, I kept the old analysis alive for testing in the LKB: if 
you type in "nas nas sun zo", you should still get an analysis, and you 
can even generate from it (using the ersatz). In ace, this "native" 
analysis is disabled by requiring an empty RELS on the reduplicant (as a 
type addendum on the phrasal type), which will filter out the native 
contentful entry, and keep the semantically empty one (the ersatz), thus 
avoiding any spurious ambiguity. Since the LKB analyses are a superset 
of the ace analyses, there will not be any issue w.r.t. treebanking.

As I feel about it, total reduplication is the killer argument for 
having chart mapping in the LKB, or else for having a complete ace-based 
development environment.

4. Outlook: Chinese and Indonesian

As far as I can tell, my approach for Hausa should be straightforward to 
port to these two languages. There is the question of the X: you would 
probably treat that as a morphophonological effect of some rule 
application (I guess). So, given our technologies, the most 
straightforward thing to do is place all morphophonological changes on 
the base. That leaves you free to impose simple total identity on the 
reduplicant: just choose the right moment in the derivation (thanks to 
Woodley who has just fixed the recording of orth values in ace an hour 
or two ago). If that should not be viable (there are languages like 
that), one needs to replicate some of the morphology in token mapping. 
For reference, look at how the ERG deals with plurals of unknown words.

All the best,

Berthold

On 08/09/15 16:26, Berthold Crysmann wrote:
> Hi Francis,
>
> I guess you only worry about total reduplication. While in principle 
> in Chinese you could get away with using string unification (thanks to 
> the script, word length is limited), but specifying the letter set 
> will not be much for for either humans or machines....
>
> I am currently working on total reduplication in my Hausa grammar. I 
> had a first solution that is non-compositional in the semantics. I.e. 
> I just use a binary rule that glues the stuff together, conditioned on 
> identity of predicates, but throws away the semantic contribution in 
> the reduplicant. Works in parsing, but needs *item-specific* trigger 
> rules in generation.
>
> Right now, I am exploring with ersatzing. Things should work well as 
> long as only one of the reduplicants undergoes additional morphology. 
> Otherwise, you'll have to memoise parts of the original string, so you 
> can apply regular morphophonological changes to the reduplicant. Seems 
> to work in the LKB.
>
> I shall commit that new analysis very soon. I shall also send you a 
> detailed description.
>
> Cheers,
>
> Berthold
>
> On 08/09/15 15:28, Francis Bond wrote:
>> G'day,
>>
>> we are working with a couple of languages where we would like to be
>> able to write lexical rules that do things like:
>>
>> Take a two character word in Chinese (AB)
>> AB -> ABAB
>> AB -> AABB
>> AB -> AAB
>> AB -> ABXAB (where X is fixed by the rule)
>> AB -> AXAB
>> AB -> AXB
>>
>> Take a one character word (A):
>> A -> AA
>> A -> AXA
>> A -> AAX
>>
>> In Indonesian we want to take an arbitrary word and produce a duplicate
>> w -> w-w   (kasus -> kasus-kasus)
>>
>> More examples here:
>> http://moin.delph-in.net/LADChineseReduplication
>> and
>> http://moin.delph-in.net/LADChineseAnotA
>>
>> Is this (or some of this) possible with the DELPH-IN tools?  If so,
>> can someone explain how to do it (or point to a paper or website that
>> tells us how to do it)?
>>
>> Thanks in advance,
>>
>>
>
>

-- 
Berthold Crysmann<crysmann at linguist.jussieu.fr>
CNRS, Laboratoire de linguistique formelle (UMR 7110), U Paris Diderot
Case 7031, 5 rue Thomas Mann, 75205 Paris cedex 13
Bureau 545, bâtiment Olympe de Gouges, rue Albert Einstein, 75013 Paris