[developers] Reduplication in morpholgy

Mon Sep 14 03:45:35 CEST 2015

Thanks Berthold (and Woodley).

This looks very promising, if a bit daunting (my mastery of the chart
mapping machinery is still shaky).   We will try it out, and probably
pester
 you with more questions.

On Mon, Sep 14, 2015 at 8:00 AM, Berthold Crysmann
<berthold.crysmann at gmail.com> wrote:
> Hi Francis, Emily, and who else might be listening on this,
>
> I have now got a version of my revised analysis working (thanks also to
> Woodley for fixing a critical issue in ace for me).
>
> Here's a brief description (for more details look at the HaG code
> $LOGONROOT/llf/hag, or ask me):
>
> Hausa has some 4 inflectional classes that use total reduplication. While
> tonal patterns may differ between base and reduplicant, the reduplicant does
> not undergo inflection different from the base. Thus, on the string level
> (once we have dealt with suprasegmental markings), the reduplicant is
> faithful to the base (lucky for me). We do get further inflectional marking
> on the base, however.
>
> E.g. in plural class 12, we get
>
> nâs nâs `nurses'
> sìkêt sìkêt `skirts'
> jōjì jōjì `judges'
>
> The latter can be morphologically possessed, yielding "jōjì jōjiìnmù" (our
> judges).
>
> So in this class, the reduplicant is just a copy of the base lexeme's orth
> value (+ tone + length).
>
> A more interesting case are augmentative adjectives. Here the base for the
> productive formation is partially reduplicated in the masc and fem singular,
> but one drops the partial reduplication in the plural, to have total
> reduplication of a clipped base instead.
>
> E.g. class 14:
>
> mālàm bundumēmḕ ->  mā̀làmai bundumā bùndùmā̀
>
> Which can also undergo type raising, giving
>
> bundumā bùndùmàn mā̀làmai
>
> Note that the inflection -n on the base does not get copied to the
> reduplicant.
>
>
> As for the analysis, here is what I now do (in ace):
>
> 1. Parsing
>
> As a first step, I trigger an ersatz, in token mapping, on every item in the
> chart that is followed by some other item.
> The surface string I substitute is __REDUP__, which corresponds to exactly
> one lexical item (redup_n).
> I tried filtering already on this first step on partial identity of form
> (with regexps), but that did not seem to work. The ersatzing records the
> original string in +CARG. The rule applies after processing of
> suprasegmentals (tone and length), and copies that information over as well.
> See tmr/redup.tdl.
>
> In a second step, I shall use lexical filtering, which applies after lookup
> and morphological processing, to get rid of unlicensed reduplication entries
> in the chart. I'll probably implement that beginning of next week.
>
> During morphological processing, I introduce constraints for the reduplicant
> as part of the morphological rules applying to the base. In particular, this
> enables me to select precisely at which step in the derivation I want to
> memoise the identity of the base. All the constraints on the reduplicant are
> collected in MORPH.MCLASS.--REDUP. See rule n_pl12_lr (irules.tdl), or, even
> better, n_pl14_lr.
>
> In syntax, I use a binary rule to combine the reduplicant ersatz with the
> base and impose all the constraints the base has for the shape of the
> reduplicant (rule n-pl-reduplication in rules.tdl).
>
> Different plural formation patterns require different levels of identity:
> e.g. class 12 copies the segments and suprasegmentals of the lexeme, whereas
> class 14 copies the segments of the derived plural and imposes a fixed H+
> pattern on the reduplicant, and a fixed L+ tone pattern on the base. Either
> of them exempts further inflectional markings from reduplication.
>
> This is all done by the  morphological rules applying to  the base, which
> also store the constraints re the reduplicant in MORPH.--REDUP, having a
> value for orthography (--STEM string) and suprasegmentals (--SUPRA supra).
> See the rules for class 12 and class 14 in irules.tdl. The binary
> reduplication rule then imposes these constraints on the reduplicant.
>
> Some sample sentences for you to test:
>
> joji jojinmu sun zo
> joji jojin sun zo
> malamai bunduma bunduma sun zo
>
> In the hag directory, you can test with
>
> ace -l -g ace/hausa.dat
>
> for parsing
>
> or with
>
> ace -T -g ace/hausa.dat|ace -e -l --show-gen-chart -g ace/hausa.g.dat
>
> for generation.
>
>
> 2. Generation
>
> Using a single generic entry for all reduplicants, I can now trigger the one
> I need based on very general properties, i.e. plural in Hausa. Look at
> reduplication.mtr for reference. Since there's only one rule left now, it is
> soon going to move into the main trigger.mtr.
>
> Using an ersatz, I have to replace the generic __REDUP__ phonology with
> something sensible at some point: I use post-generation chart mapping to
> copy over the relevant string from the base. The relevant rule is the third
> or fourth one down in tmr/post-generation.tdl. Likewise, I copy over the
> constraints on tone and vowel length (SUPRA) as well, which are imposed in
> the --REDUP.--SUPRA features.
>
>
> 3. Conclusion
>
> To summarise, the new solution scales up to open classes, and it is fully
> supported in ace. As for the LKB, you can test with the ersatz as input,
> e.g. "__redup__ joji sun zo" parses and generates. Generation will only use
> the ersatz. I shall look into refining my user function to get proper
> generation output.
>
> For the moment, I kept the old analysis alive for testing in the LKB: if you
> type in "nas nas sun zo", you should still get an analysis, and you can even
> generate from it (using the ersatz). In ace, this "native" analysis is
> disabled by requiring an empty RELS on the reduplicant (as a type addendum
> on the phrasal type), which will filter out the native contentful entry, and
> keep the semantically empty one (the ersatz), thus avoiding any spurious
> ambiguity. Since the LKB analyses are a superset of the ace analyses, there
> will not be any issue w.r.t. treebanking.
>
> As I feel about it, total reduplication is the killer argument for having
> chart mapping in the LKB, or else for having a complete ace-based
> development environment.
>
> 4. Outlook: Chinese and Indonesian
>
> As far as I can tell, my approach for Hausa should be straightforward to
> port to these two languages. There is the question of the X: you would
> probably treat that as a morphophonological effect of some rule application
> (I guess). So, given our technologies, the most straightforward thing to do
> is place all morphophonological changes on the base. That leaves you free to
> impose simple total identity on the reduplicant: just choose the right
> moment in the derivation (thanks to Woodley who has just fixed the recording
> of orth values in ace an hour or two ago). If that should not be viable
> (there are languages like that), one needs to replicate some of the
> morphology in token mapping. For reference, look at how the ERG deals with
> plurals of unknown words.
>
>
> All the best,
>
> Berthold
>
>
> On 08/09/15 16:26, Berthold Crysmann wrote:
>>
>> Hi Francis,
>>
>> I guess you only worry about total reduplication. While in principle in
>> Chinese you could get away with using string unification (thanks to the
>> script, word length is limited), but specifying the letter set will not be
>> much for for either humans or machines....
>>
>> I am currently working on total reduplication in my Hausa grammar. I had a
>> first solution that is non-compositional in the semantics. I.e. I just use a
>> binary rule that glues the stuff together, conditioned on identity of
>> predicates, but throws away the semantic contribution in the reduplicant.
>> Works in parsing, but needs *item-specific* trigger rules in generation.
>>
>> Right now, I am exploring with ersatzing. Things should work well as long
>> as only one of the reduplicants undergoes additional morphology. Otherwise,
>> you'll have to memoise parts of the original string, so you can apply
>> regular morphophonological changes to the reduplicant. Seems to work in the
>> LKB.
>>
>> I shall commit that new analysis very soon. I shall also send you a
>> detailed description.
>>
>> Cheers,
>>
>> Berthold
>>
>> On 08/09/15 15:28, Francis Bond wrote:
>>>
>>> G'day,
>>>
>>> we are working with a couple of languages where we would like to be
>>> able to write lexical rules that do things like:
>>>
>>> Take a two character word in Chinese (AB)
>>> AB -> ABAB
>>> AB -> AABB
>>> AB -> AAB
>>> AB -> ABXAB (where X is fixed by the rule)
>>> AB -> AXAB
>>> AB -> AXB
>>>
>>> Take a one character word (A):
>>> A -> AA
>>> A -> AXA
>>> A -> AAX
>>>
>>> In Indonesian we want to take an arbitrary word and produce a duplicate
>>> w -> w-w   (kasus -> kasus-kasus)
>>>
>>> More examples here:
>>> http://moin.delph-in.net/LADChineseReduplication
>>> and
>>> http://moin.delph-in.net/LADChineseAnotA
>>>
>>> Is this (or some of this) possible with the DELPH-IN tools?  If so,
>>> can someone explain how to do it (or point to a paper or website that
>>> tells us how to do it)?
>>>
>>> Thanks in advance,
>>>
>>>
>>
>>
>
>
> --
> Berthold Crysmann<crysmann at linguist.jussieu.fr>
> CNRS, Laboratoire de linguistique formelle (UMR 7110), U Paris Diderot
> Case 7031, 5 rue Thomas Mann, 75205 Paris cedex 13
> Bureau 545, bâtiment Olympe de Gouges, rue Albert Einstein, 75013 Paris
>

-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University