[developers] Hacking the DELPH-IN framework for a null morpheme: a semi-success

Woodley Packard sweaglesw at sweaglesw.org
Fri Apr 12 18:43:27 CEST 2019


Hi Stephan,

I agree that the ambiguity you reference, and that David and Emily believe they are experiencing, and I also alluded to a few emails ago, exists in principle and might be interesting in some situations.  I would be surprised to see it result in two identical uninflected lexical edges in the chart, however, at least in ACE’s current implementation.  The ambiguity arises as a result of two ways to apply the first rule, not from two different lexical starting places.  Furthermore I didn’t (?) think ACE was set up to generate multiple output edges from a single rule/daughter combination, as I maintain would be required for this ambiguity.

To the question of whether ace blocks that ambiguity: the only difference would be in the ORTH feature of the two intermediate edges, I suppose.  Until recently ACE did not even write orthographemic changes to that feature.  I suppose at the moment it arbitrarily picks one or the other.  Just now it seems that approach could theoretically result in inadvertently taking an orthographemic dead-end, although I’m not aware of that issue ever coming up in practice.  I think I could be persuaded that the right thing to do would be to generate both variants (from just one source lexical edge), but before implementing that I think an appropriate improvement to UDF would be in order.

Enjoy the train ride!
Woodley

> On Apr 12, 2019, at 9:15 AM, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> hi woodley, and all,
> 
> i would have thought our current engines are quite capable of generating seemingly duplicate derivations, owing to incomplete information being recorded about orthographemic segmentation.  in fact, i believe i recall at least JaCY exposing this phenomenon in the LKB and PET.
> 
> assume two orthographemic rules, each with two subrules (which both can apply to some stem), maybe something like:
> 
> one :-
> %suffix (* s) (e ss)
> [...].
> 
> 
> two :=
> %suffix (!s !sed) (s ssed)
> [...].
> 
> not sure the above is quite right, but i want to allow two chains through these rules that only differ in the internal string segmentation, e.g. assuming a stem ‘fore’ and a final surface form ‘foressed’:
> 
> fore one fores two foressed
> fore one foress two foressed
> 
> from what i recall about at least the PET implementation, i would expect the above to result in two distinct lexical sub-trees whose derivations in current UDX will look alike (and which are guaranteed to immediately pack)
> 
> would you expect to block such ambiguity (in ACE)?  if so, on what basis, and which of the two variants should prevail?
> 
> —ever since yi and others started retracing (with some effort) the exact string-level effects of orthographemic nodes in our derivations i have been thinking we should extend UDX and probably both record which affixation sub-rule applied, and what the strings at the ‘top’ and the ‘bottom’ looked like.
> 
> would this extra information seem adequate and sufficient to you (and others)?  if so, i would like to try and work out how to extend the UDX syntax, while maintaining backwards compatibility.
> 
> greetings from the train to finse 1222!  oe
> 
> 
> 
> 
>> On Fri, 12 Apr 2019 at 06:51 Woodley Packard <sweaglesw at sweaglesw.org> wrote:
>> Hi David,
>> 
>> If I understand correctly you have a lexical entry whose orthography in the lexicon is “=0” but which only ever appears in combination with the prefix or suffix or both, which lets you cover up the fact that the =0 was ever there.  Sounds reasonable to me.
>> 
>> Forgive me if this is too obvious and not what’s going on, but:  getting two parses when both suffix and prefix are present seems likely to be caused by unconstrained order of application of those rules.  Have you checked whether the two parses you get have the rules applying in opposite orders?  If so, the solution is simply to constrain things so that one of them cannot consume the other’s output.
>> 
>> If on the other hand the two parses have identical derivations then the result is unexpected — at least under the currently used definition of derivation trees.  There have been suggestions that derivations with different internal inflected string values resulting from different subrules of the %prefix and %suffix mechanisms should be considered distinct (and that those should be recorded as part of the derivation tree), but to my knowledge none of our systems supports that yet, nor do I believe a format has been decided upon.
>> 
>> Best,
>> Woodley
>> 
>>> On Apr 11, 2019, at 8:56 PM, David Inman <davinman at uw.edu> wrote:
>>> 
>>> Hello developers,
>>> 
>>> I am using the irules to define a null morpheme by having prefixes and suffixes overwrite a string (=0, 3rd person marking on a clitic complex) when they attach to it. The irules look like this:
>>> 
>>> past-prefix-2 :=
>>> %prefix (* =int) (=0 =int)
>>> past-lex-rule.
>>> 
>>> clitic-plural-suffix :=
>>> %suffix (* =ʔał) (=0 =ʔał)
>>> clitic-plural-lex-rule.
>>> 
>>> This works and generates strings that are lacking the =0 morpheme. Except that in the case where both a prefix and a suffix attach, the parser enters two =0 morphemes into the parse chart and will parse it doubly. (This does not happen for contentful roots.) If the =0 has only "suffixes" after it, then I get one parse. If it has only "prefixes" then I also get one parse. I think the parser sees that =0 can be overwritten either by the prefix or the suffix so it hypothesizes it twice. I'm using the morph rules a bit differently than intended, but is this a case that should be supported? Is there any way around this so that I limit the parsers behavior and get one parse?
>>> 
>>> David Inman
>>> PhD Candidate
>>> University of Washington Linguistics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20190412/b6c80c7f/attachment.html>


More information about the developers mailing list