[developers] Hacking the DELPH-IN framework for a null morpheme: a semi-success

Woodley Packard sweaglesw at sweaglesw.org
Fri Apr 12 17:49:42 CEST 2019


I assume no token mapping  or YY input is involved which could lead to two underlying tokens.  I, like you, find it surprising that two raw instances of the same lexeme would enter the chart for the same token.  Have you checked whether other processors give the same result?

If using ACE, running with -vvv will yield data about tokens and orthographemic exploration near the top of the (generous) log output.  You will see lines that look like:

=int=ʔał -> =0 [1 ways]

If that line appears twice, that is likely at least an intermediate source of the problem. Also review the list of initial tokens printed before the orthographemic exploration log; two =int=ʔał tokens would be surprising but would explain the problem.

If there is only one such orthographemic exploration entry, it becomes harder to guess how two identical lexical edges could be generated.

Curious, Woodley

> On Apr 12, 2019, at 6:46 AM, Emily M. Bender <ebender at uw.edu> wrote:
> 
> From my understanding of looking at this with David, the full rules are constrained regarding their order of application. Instead, it seems to be the first step (stripping the affixes to find the roots) where the order is underspecified. The =0 lexical entry gives rise to two separate edges in the chart, which then each can take the two affixes, in the same fixed sequence. Is there a way to tell the processor not to do that?
> 
> Emily
> 
>> On Thu, Apr 11, 2019 at 9:51 PM Woodley Packard <sweaglesw at sweaglesw.org> wrote:
>> Hi David,
>> 
>> If I understand correctly you have a lexical entry whose orthography in the lexicon is “=0” but which only ever appears in combination with the prefix or suffix or both, which lets you cover up the fact that the =0 was ever there.  Sounds reasonable to me.
>> 
>> Forgive me if this is too obvious and not what’s going on, but:  getting two parses when both suffix and prefix are present seems likely to be caused by unconstrained order of application of those rules.  Have you checked whether the two parses you get have the rules applying in opposite orders?  If so, the solution is simply to constrain things so that one of them cannot consume the other’s output.
>> 
>> If on the other hand the two parses have identical derivations then the result is unexpected — at least under the currently used definition of derivation trees.  There have been suggestions that derivations with different internal inflected string values resulting from different subrules of the %prefix and %suffix mechanisms should be considered distinct (and that those should be recorded as part of the derivation tree), but to my knowledge none of our systems supports that yet, nor do I believe a format has been decided upon.
>> 
>> Best,
>> Woodley
>> 
>>> On Apr 11, 2019, at 8:56 PM, David Inman <davinman at uw.edu> wrote:
>>> 
>>> Hello developers,
>>> 
>>> I am using the irules to define a null morpheme by having prefixes and suffixes overwrite a string (=0, 3rd person marking on a clitic complex) when they attach to it. The irules look like this:
>>> 
>>> past-prefix-2 :=
>>> %prefix (* =int) (=0 =int)
>>> past-lex-rule.
>>> 
>>> clitic-plural-suffix :=
>>> %suffix (* =ʔał) (=0 =ʔał)
>>> clitic-plural-lex-rule.
>>> 
>>> This works and generates strings that are lacking the =0 morpheme. Except that in the case where both a prefix and a suffix attach, the parser enters two =0 morphemes into the parse chart and will parse it doubly. (This does not happen for contentful roots.) If the =0 has only "suffixes" after it, then I get one parse. If it has only "prefixes" then I also get one parse. I think the parser sees that =0 can be overwritten either by the prefix or the suffix so it hypothesizes it twice. I'm using the morph rules a bit differently than intended, but is this a case that should be supported? Is there any way around this so that I limit the parsers behavior and get one parse?
>>> 
>>> David Inman
>>> PhD Candidate
>>> University of Washington Linguistics
> 
> 
> -- 
> Emily M. Bender
> Professor, Department of Linguistics
> University of Washington
> Twitter: @emilymbender
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20190412/f39afd26/attachment.html>


More information about the developers mailing list