[developers] Hacking the DELPH-IN framework for a null morpheme: a semi-success

Emily M. Bender ebender at uw.edu
Fri May 10 16:07:18 CEST 2019

We just hit this same problem with the Nuuchahnulth grammar in 567 --- it
contains irules like this one:

noun-pc105_lrt1-suffix :=
%suffix (* =\!iˑ)

... where ace doesn't seem to be respecting the \ escaping the !. David,
did you ever find a resolution here?


On Fri, Apr 12, 2019 at 12:33 PM David Inman <davinman at uw.edu> wrote:

> As Emily indicated, the prefixes always attach first, followed by the
> suffixes. The two LKB parses are identical.
> I'm having difficulty parsing with Ace, apparently due to special symbols.
> Once I introduce clitics with = I get the following:
> ERROR: morpho: no such letter set as `a'
> I had other problems in ace previously so just switched to LKB.
> David Inman
> PhD Candidate
> University of Washington Linguistics
> On Fri, Apr 12, 2019 at 9:44 AM Woodley Packard <sweaglesw at sweaglesw.org>
> wrote:
>> Hi Stephan,
>> I agree that the ambiguity you reference, and that David and Emily
>> believe they are experiencing, and I also alluded to a few emails ago,
>> exists in principle and might be interesting in some situations.  I would
>> be surprised to see it result in two identical uninflected lexical edges in
>> the chart, however, at least in ACE’s current implementation.  The
>> ambiguity arises as a result of two ways to apply the first rule, not from
>> two different lexical starting places.  Furthermore I didn’t (?) think ACE
>> was set up to generate multiple output edges from a single rule/daughter
>> combination, as I maintain would be required for this ambiguity.
>> To the question of whether ace blocks that ambiguity: the only difference
>> would be in the ORTH feature of the two intermediate edges, I suppose.
>> Until recently ACE did not even write orthographemic changes to that
>> feature.  I suppose at the moment it arbitrarily picks one or the other.
>> Just now it seems that approach could theoretically result in inadvertently
>> taking an orthographemic dead-end, although I’m not aware of that issue
>> ever coming up in practice.  I think I could be persuaded that the right
>> thing to do would be to generate both variants (from just one source
>> lexical edge), but before implementing that I think an appropriate
>> improvement to UDF would be in order.
>> Enjoy the train ride!
>> Woodley
>> On Apr 12, 2019, at 9:15 AM, Stephan Oepen <oe at ifi.uio.no> wrote:
>> hi woodley, and all,
>> i would have thought our current engines are quite capable of generating
>> seemingly duplicate derivations, owing to incomplete information being
>> recorded about orthographemic segmentation.  in fact, i believe i recall at
>> least JaCY exposing this phenomenon in the LKB and PET.
>> assume two orthographemic rules, each with two subrules (which both can
>> apply to some stem), maybe something like:
>> one :-
>> %suffix (* s) (e ss)
>> [...].
>> two :=
>> %suffix (!s !sed) (s ssed)
>> [...].
>> not sure the above is quite right, but i want to allow two chains through
>> these rules that only differ in the internal string segmentation, e.g.
>> assuming a stem ‘fore’ and a final surface form ‘foressed’:
>> fore one fores two foressed
>> fore one foress two foressed
>> from what i recall about at least the PET implementation, i would expect
>> the above to result in two distinct lexical sub-trees whose derivations in
>> current UDX will look alike (and which are guaranteed to immediately pack)
>> would you expect to block such ambiguity (in ACE)?  if so, on what basis,
>> and which of the two variants should prevail?
>> —ever since yi and others started retracing (with some effort) the exact
>> string-level effects of orthographemic nodes in our derivations i have been
>> thinking we should extend UDX and probably both record which affixation
>> sub-rule applied, and what the strings at the ‘top’ and the ‘bottom’ looked
>> like.
>> would this extra information seem adequate and sufficient to you (and
>> others)?  if so, i would like to try and work out how to extend the UDX
>> syntax, while maintaining backwards compatibility.
>> greetings from the train to finse 1222!  oe
>> On Fri, 12 Apr 2019 at 06:51 Woodley Packard <sweaglesw at sweaglesw.org>
>> wrote:
>>> Hi David,
>>> If I understand correctly you have a lexical entry whose orthography in
>>> the lexicon is “=0” but which only ever appears in combination with the
>>> prefix or suffix or both, which lets you cover up the fact that the =0 was
>>> ever there.  Sounds reasonable to me.
>>> Forgive me if this is too obvious and not what’s going on, but:  getting
>>> two parses when both suffix and prefix are present seems likely to be
>>> caused by unconstrained order of application of those rules.  Have you
>>> checked whether the two parses you get have the rules applying in opposite
>>> orders?  If so, the solution is simply to constrain things so that one of
>>> them cannot consume the other’s output.
>>> If on the other hand the two parses have identical derivations then the
>>> result is unexpected — at least under the currently used definition of
>>> derivation trees.  There have been suggestions that derivations with
>>> different internal inflected string values resulting from different
>>> subrules of the %prefix and %suffix mechanisms should be considered
>>> distinct (and that those should be recorded as part of the derivation
>>> tree), but to my knowledge none of our systems supports that yet, nor do I
>>> believe a format has been decided upon.
>>> Best,
>>> Woodley
>>> On Apr 11, 2019, at 8:56 PM, David Inman <davinman at uw.edu> wrote:
>>> Hello developers,
>>> I am using the irules to define a null morpheme by having prefixes and
>>> suffixes overwrite a string (=0, 3rd person marking on a clitic complex)
>>> when they attach to it. The irules look like this:
>>> past-prefix-2 :=
>>> %prefix (* =int) (=0 =int)
>>> past-lex-rule.
>>> clitic-plural-suffix :=
>>> %suffix (* =ʔał) (=0 =ʔał)
>>> clitic-plural-lex-rule.
>>> This works and generates strings that are lacking the =0 morpheme.
>>> Except that in the case where both a prefix and a suffix attach, the parser
>>> enters two =0 morphemes into the parse chart and will parse it doubly.
>>> (This does not happen for contentful roots.) If the =0 has only "suffixes"
>>> after it, then I get one parse. If it has only "prefixes" then I also get
>>> one parse. I think the parser sees that =0 can be overwritten either by the
>>> prefix or the suffix so it hypothesizes it twice. I'm using the morph rules
>>> a bit differently than intended, but is this a case that should be
>>> supported? Is there any way around this so that I limit the parsers
>>> behavior and get one parse?
>>> David Inman
>>> PhD Candidate
>>> University of Washington Linguistics

Emily M. Bender
Professor, Department of Linguistics
University of Washington
Twitter: @emilymbender
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20190510/54101eb0/attachment.html>

More information about the developers mailing list