[developers] Reduplication in morpholgy

Berthold Crysmann berthold.crysmann at gmail.com
Wed Sep 16 00:25:16 CEST 2015


Hi again,

On 15/09/15 16:40, Emily M. Bender wrote:
> Thanks, Berthold, for the explanation.  This may turn out to be useful
> for Lushootseed and Nuu-chah-nulth, though the situation there is somewhat
> different.  In particular, I'm intrigued that you are treating 
> __REDUP__ and
> the base as separate lexical items.  Is this because of the whitespace
> conventions of the language?  Or because it was easier to do things this
> way than completely within the morphology?
>

Hausa does use space or hyphen to separate these guys. Remember: we do 
not yet have a solution for bog-standard German compounds.  So I am 
lucky there: decnt language, possibly there've been some Brits around at 
some point...

But there is also tonal evidence suggesting that we are really dealing 
with two minimal morphological words: There is no tone copying in any 
morphological construiction of the language, and there are no word other 
than total reduplication involving two tonal spreading processes. 
Moreover, the string to be copied is easily defined in terms of minimal 
word, but hard to define in terms of managable prosodic structures  
(segment, syllable, foot). Hausa also has partial reduplication 
(pluarctionals, some plurals), and I do that in the morphology: just try 
out "sun kakkaranta littafi" (they repeatedly read the book).

How complex is the material you need to copy? Things like C1V2C3 C1V2C3 
are probably best done in the morphology. Just create two identical 
letter sets for the consonants, and there you are. I am/was talking 
total reduplication in this thread so far (thinking of Indonesian).

E.g. for CVC reduplication:

!c = ptkbdgfsxvzX
!k = ptkbdgfsxvzX
!v = aeiou

%prefix (!c!v!k !c!v!k!c!v!k)

I guess there are rules similar to this in HaG. If you are intersted, I 
can chase them up. I did notice, when doing umlaut in GG, that with 
increasing numbers of characters to be memorised, the memory footprint 
on the LKB went up considerably. But that is years ago, when 2GB was 
luxury.

One specific issue: the total reduplication phrasal rule is also a 
subtype of word-or-lexrule-min, so it can be a head daughter in 
constructions that operate on lexical signs only (e.g. to combine with 
bound demonstratives "joji jojin nan sun zo" ). I think of it as a 
branching lexical rule, somehow.

> Also, you said (in a separate exchange we had) that the older solution 
> was
> non-compositional ... presumably because you were using the native lex 
> entry
> for the redup form and then squashing its EPs.
Right.
> Am I understanding correctly
> that this current solution is compositional because the ersatz item is
> semantically empty?

Exactly! Actually, the old analysis still works in the LKB. I use a type 
addendum to impose an empty RELS list for ace and pet to get rid of the 
spurious ambiguity. There's an extra file ace-types.tdl now that is 
loaded by pet and ace.

I have not tested much of this in Pet. It appears to work with the three 
telling examples I tried just now. Shall do more systematic testing, 
once I have got the time.

The issue Woodley fixed related to the representation of orthographemic 
changes in the AVMs: now every change is recorded on every individual 
step in the derivation, as it should be, so you can chirurgically select 
the point where you want to grab that information for copying. That 
wasn't the case before, so you have to *upgrade to the current svn 
version* imperatively! I am putting this here to avoid fruitless 
experimenting on old versions of our engines.

Cheers,

Berthold

> Emily
>
>
> On Sun, Sep 13, 2015 at 6:45 PM, Francis Bond <bond at ieee.org 
> <mailto:bond at ieee.org>> wrote:
>
>     Thanks Berthold (and Woodley).
>
>     This looks very promising, if a bit daunting (my mastery of the chart
>     mapping machinery is still shaky).   We will try it out, and probably
>     pester
>      you with more questions.
>
>     On Mon, Sep 14, 2015 at 8:00 AM, Berthold Crysmann
>     <berthold.crysmann at gmail.com <mailto:berthold.crysmann at gmail.com>>
>     wrote:
>     > Hi Francis, Emily, and who else might be listening on this,
>     >
>     > I have now got a version of my revised analysis working (thanks
>     also to
>     > Woodley for fixing a critical issue in ace for me).
>     >
>     > Here's a brief description (for more details look at the HaG code
>     > $LOGONROOT/llf/hag, or ask me):
>     >
>     > Hausa has some 4 inflectional classes that use total
>     reduplication. While
>     > tonal patterns may differ between base and reduplicant, the
>     reduplicant does
>     > not undergo inflection different from the base. Thus, on the
>     string level
>     > (once we have dealt with suprasegmental markings), the
>     reduplicant is
>     > faithful to the base (lucky for me). We do get further
>     inflectional marking
>     > on the base, however.
>     >
>     > E.g. in plural class 12, we get
>     >
>     > nâs nâs `nurses'
>     > sìkêt sìkêt `skirts'
>     > jōjì jōjì `judges'
>     >
>     > The latter can be morphologically possessed, yielding "jōjì
>     jōjiìnmù" (our
>     > judges).
>     >
>     > So in this class, the reduplicant is just a copy of the base
>     lexeme's orth
>     > value (+ tone + length).
>     >
>     > A more interesting case are augmentative adjectives. Here the
>     base for the
>     > productive formation is partially reduplicated in the masc and
>     fem singular,
>     > but one drops the partial reduplication in the plural, to have total
>     > reduplication of a clipped base instead.
>     >
>     > E.g. class 14:
>     >
>     > mālàm bundumēmḕ ->  mā̀làmai bundumā bùndùmā̀
>     >
>     > Which can also undergo type raising, giving
>     >
>     > bundumā bùndùmàn mā̀làmai
>     >
>     > Note that the inflection -n on the base does not get copied to the
>     > reduplicant.
>     >
>     >
>     > As for the analysis, here is what I now do (in ace):
>     >
>     > 1. Parsing
>     >
>     > As a first step, I trigger an ersatz, in token mapping, on every
>     item in the
>     > chart that is followed by some other item.
>     > The surface string I substitute is __REDUP__, which corresponds
>     to exactly
>     > one lexical item (redup_n).
>     > I tried filtering already on this first step on partial identity
>     of form
>     > (with regexps), but that did not seem to work. The ersatzing
>     records the
>     > original string in +CARG. The rule applies after processing of
>     > suprasegmentals (tone and length), and copies that information
>     over as well.
>     > See tmr/redup.tdl.
>     >
>     > In a second step, I shall use lexical filtering, which applies
>     after lookup
>     > and morphological processing, to get rid of unlicensed
>     reduplication entries
>     > in the chart. I'll probably implement that beginning of next week.
>     >
>     > During morphological processing, I introduce constraints for the
>     reduplicant
>     > as part of the morphological rules applying to the base. In
>     particular, this
>     > enables me to select precisely at which step in the derivation I
>     want to
>     > memoise the identity of the base. All the constraints on the
>     reduplicant are
>     > collected in MORPH.MCLASS.--REDUP. See rule n_pl12_lr
>     (irules.tdl), or, even
>     > better, n_pl14_lr.
>     >
>     > In syntax, I use a binary rule to combine the reduplicant ersatz
>     with the
>     > base and impose all the constraints the base has for the shape
>     of the
>     > reduplicant (rule n-pl-reduplication in rules.tdl).
>     >
>     > Different plural formation patterns require different levels of
>     identity:
>     > e.g. class 12 copies the segments and suprasegmentals of the
>     lexeme, whereas
>     > class 14 copies the segments of the derived plural and imposes a
>     fixed H+
>     > pattern on the reduplicant, and a fixed L+ tone pattern on the
>     base. Either
>     > of them exempts further inflectional markings from reduplication.
>     >
>     > This is all done by the  morphological rules applying to  the
>     base, which
>     > also store the constraints re the reduplicant in MORPH.--REDUP,
>     having a
>     > value for orthography (--STEM string) and suprasegmentals
>     (--SUPRA supra).
>     > See the rules for class 12 and class 14 in irules.tdl. The binary
>     > reduplication rule then imposes these constraints on the
>     reduplicant.
>     >
>     > Some sample sentences for you to test:
>     >
>     > joji jojinmu sun zo
>     > joji jojin sun zo
>     > malamai bunduma bunduma sun zo
>     >
>     > In the hag directory, you can test with
>     >
>     > ace -l -g ace/hausa.dat
>     >
>     > for parsing
>     >
>     > or with
>     >
>     > ace -T -g ace/hausa.dat|ace -e -l --show-gen-chart -g
>     ace/hausa.g.dat
>     >
>     > for generation.
>     >
>     >
>     > 2. Generation
>     >
>     > Using a single generic entry for all reduplicants, I can now
>     trigger the one
>     > I need based on very general properties, i.e. plural in Hausa.
>     Look at
>     > reduplication.mtr for reference. Since there's only one rule
>     left now, it is
>     > soon going to move into the main trigger.mtr.
>     >
>     > Using an ersatz, I have to replace the generic __REDUP__
>     phonology with
>     > something sensible at some point: I use post-generation chart
>     mapping to
>     > copy over the relevant string from the base. The relevant rule
>     is the third
>     > or fourth one down in tmr/post-generation.tdl. Likewise, I copy
>     over the
>     > constraints on tone and vowel length (SUPRA) as well, which are
>     imposed in
>     > the --REDUP.--SUPRA features.
>     >
>     >
>     > 3. Conclusion
>     >
>     > To summarise, the new solution scales up to open classes, and it
>     is fully
>     > supported in ace. As for the LKB, you can test with the ersatz
>     as input,
>     > e.g. "__redup__ joji sun zo" parses and generates. Generation
>     will only use
>     > the ersatz. I shall look into refining my user function to get
>     proper
>     > generation output.
>     >
>     > For the moment, I kept the old analysis alive for testing in the
>     LKB: if you
>     > type in "nas nas sun zo", you should still get an analysis, and
>     you can even
>     > generate from it (using the ersatz). In ace, this "native"
>     analysis is
>     > disabled by requiring an empty RELS on the reduplicant (as a
>     type addendum
>     > on the phrasal type), which will filter out the native
>     contentful entry, and
>     > keep the semantically empty one (the ersatz), thus avoiding any
>     spurious
>     > ambiguity. Since the LKB analyses are a superset of the ace
>     analyses, there
>     > will not be any issue w.r.t. treebanking.
>     >
>     > As I feel about it, total reduplication is the killer argument
>     for having
>     > chart mapping in the LKB, or else for having a complete ace-based
>     > development environment.
>     >
>     > 4. Outlook: Chinese and Indonesian
>     >
>     > As far as I can tell, my approach for Hausa should be
>     straightforward to
>     > port to these two languages. There is the question of the X: you
>     would
>     > probably treat that as a morphophonological effect of some rule
>     application
>     > (I guess). So, given our technologies, the most straightforward
>     thing to do
>     > is place all morphophonological changes on the base. That leaves
>     you free to
>     > impose simple total identity on the reduplicant: just choose the
>     right
>     > moment in the derivation (thanks to Woodley who has just fixed
>     the recording
>     > of orth values in ace an hour or two ago). If that should not be
>     viable
>     > (there are languages like that), one needs to replicate some of the
>     > morphology in token mapping. For reference, look at how the ERG
>     deals with
>     > plurals of unknown words.
>     >
>     >
>     > All the best,
>     >
>     > Berthold
>     >
>     >
>     > On 08/09/15 16:26, Berthold Crysmann wrote:
>     >>
>     >> Hi Francis,
>     >>
>     >> I guess you only worry about total reduplication. While in
>     principle in
>     >> Chinese you could get away with using string unification
>     (thanks to the
>     >> script, word length is limited), but specifying the letter set
>     will not be
>     >> much for for either humans or machines....
>     >>
>     >> I am currently working on total reduplication in my Hausa
>     grammar. I had a
>     >> first solution that is non-compositional in the semantics. I.e.
>     I just use a
>     >> binary rule that glues the stuff together, conditioned on
>     identity of
>     >> predicates, but throws away the semantic contribution in the
>     reduplicant.
>     >> Works in parsing, but needs *item-specific* trigger rules in
>     generation.
>     >>
>     >> Right now, I am exploring with ersatzing. Things should work
>     well as long
>     >> as only one of the reduplicants undergoes additional
>     morphology. Otherwise,
>     >> you'll have to memoise parts of the original string, so you can
>     apply
>     >> regular morphophonological changes to the reduplicant. Seems to
>     work in the
>     >> LKB.
>     >>
>     >> I shall commit that new analysis very soon. I shall also send you a
>     >> detailed description.
>     >>
>     >> Cheers,
>     >>
>     >> Berthold
>     >>
>     >> On 08/09/15 15:28, Francis Bond wrote:
>     >>>
>     >>> G'day,
>     >>>
>     >>> we are working with a couple of languages where we would like
>     to be
>     >>> able to write lexical rules that do things like:
>     >>>
>     >>> Take a two character word in Chinese (AB)
>     >>> AB -> ABAB
>     >>> AB -> AABB
>     >>> AB -> AAB
>     >>> AB -> ABXAB (where X is fixed by the rule)
>     >>> AB -> AXAB
>     >>> AB -> AXB
>     >>>
>     >>> Take a one character word (A):
>     >>> A -> AA
>     >>> A -> AXA
>     >>> A -> AAX
>     >>>
>     >>> In Indonesian we want to take an arbitrary word and produce a
>     duplicate
>     >>> w -> w-w   (kasus -> kasus-kasus)
>     >>>
>     >>> More examples here:
>     >>> http://moin.delph-in.net/LADChineseReduplication
>     >>> and
>     >>> http://moin.delph-in.net/LADChineseAnotA
>     >>>
>     >>> Is this (or some of this) possible with the DELPH-IN tools? 
>     If so,
>     >>> can someone explain how to do it (or point to a paper or
>     website that
>     >>> tells us how to do it)?
>     >>>
>     >>> Thanks in advance,
>     >>>
>     >>>
>     >>
>     >>
>     >
>     >
>     > --
>     > Berthold Crysmann<crysmann at linguist.jussieu.fr
>     <mailto:crysmann at linguist.jussieu.fr>>
>     > CNRS, Laboratoire de linguistique formelle (UMR 7110), U Paris
>     Diderot
>     > Case 7031, 5 rue Thomas Mann, 75205 Paris cedex 13
>     > Bureau 545, bâtiment Olympe de Gouges, rue Albert Einstein,
>     75013 Paris
>     >
>
>
>
>     --
>     Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
>     Division of Linguistics and Multilingual Studies
>     Nanyang Technological University
>
>
>
>
> -- 
> Emily M. Bender
> Professor, Department of Linguistics
> Check out CLMS on facebook! http://www.facebook.com/uwclma


-- 
Berthold Crysmann <crysmann at linguist.jussieu.fr>
CNRS, Laboratoire de linguistique formelle (UMR 7110), U Paris Diderot
Case 7031, 5 rue Thomas Mann, 75205 Paris cedex 13
Bureau 545, bâtiment Olympe de Gouges, rue Albert Einstein, 75013 Paris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20150916/8f96033a/attachment-0001.html>


More information about the developers mailing list