[developers] Latest ERG not generating unknown names with ACE

Mon Apr 17 23:16:26 CEST 2017

The mechanism for those post generation token mapping rules is enabled 
only when at least one rule is defined, so that makes sense.

During parsing, the link from the surface form of "Ubuntu" to the CARG 
of the named_rel is a bit circuitous, and involves token mapping rules.  
Those not generally being available at generation time, an alternate 
strategy was in use in ACE, and I assume the LKB as well.  In ACE's 
case, I go to the trouble of manufacturing a new lexical entry structure 
for easier bookkeeping when instantiating a generic lexical entry during 
generation, and that structure has a slot separate from the feature 
structure in which the (uninflected) orthography is stored for ease of 
reference. The CARG was copied there when instantiating this type of 
generic, and that same field was referenced when reading out realization 
results.

Enter the post-generation token mapping setup (which I believe would be 
more correctly called post-generation lexeme mapping as currently 
implemented...).  The orthography read out in that mode comes from the 
ORTH list on the feature structure (or value of lex-stem-path).  I'm not 
able at the moment to reconstruct why I set it up differently, but just 
now it seems a perfectly reasonable place to go hunting for the surface 
value.  That value is stipulated as "_generic_proper_ne" or some such 
for proper names, hence the trouble.  I've committed a change to the ACE 
trunk that causes the generic lexical entry instantiation for generation 
mechanism to edit that ORTH list in addition to what it did previously 
(i.e. just editing its quick-reference copy of that list), which seems 
more consistent at least, if not fully above board.  The change seems to 
have the desired effect, i.e. "Ubuntu" comes out instead of 
"_generic_proper_ne".

-Woodley

On 04/15/2017 04:28 PM, Dan Flickinger wrote:
> Woodley, your prediction was a good one.  If I comment out the loading of that file in `english.tdl' and recompile, then the unknown proper names work right again, and similarly if I comment out each of the rules in the file but load it, all is still well.  But if I uncomment any one of the rules, we lose the proper names again.  So it would seem that the very act of tampering with the orthography in post-generation interacts badly with whatever the clever step is that causes the CARG value of the unknown proper name to be realized as its surface orthography.
>
>
>   Dan
>
>
> ________________________________
> From: developers-bounces at emmtee.net <developers-bounces at emmtee.net> on behalf of Woodley Packard <sweaglesw at sweaglesw.org>
> Sent: Saturday, April 15, 2017 1:59 PM
> To: Stephan Oepen
> Cc: Michael Wayne Goodman; developers at delph-in.net
> Subject: Re: [developers] Latest ERG not generating unknown names with ACE
>
> I wonder whether something may be going on with the ACE-only post-generation token mapping rules?  I believe Dan has started toying with those, although I lack the proper internet connection to investigate this hypothesis currently.
>
> Woodley
>
>
>
> On Apr 15, 2017, at 1:38 PM, Stephan Oepen <oe at ifi.uio.no> wrote:
>
>>> I attach the relevant MRSs (same sentence; one created with the ERG trunk
>>> and the other with the 1214 version).
>> these MRSs appear equivalent in content; the ERG trunk has not yet
>> turned on predicate normalization by default (because that switch also
>> turns on SEM-I–based MRS processing, and finalizing the SEM-I prior to
>> a release currently is a non-trivial process, hence not applied to the
>> trunk yet), hence the spurious string vs. type distinctions and _rel
>> suffixes on predicates.  also, which engine did you use?  it still
>> outputs old-style LTOP (which should be TOP nowadays).
>>
>> anyway, the MRSs look fine and give the expected result in the LKB generator:
>>
>> LKB(42): (pprint
>>           (lkb::generate-from-mrs
>>            (mrs::read-mrs-from-file "~/Downloads/unity-logon.mrs")))
>>
>> ("Ubuntu is dropping unity.")
>> LKB(43): (pprint
>>           (lkb::generate-from-mrs
>>            (mrs::read-mrs-from-file "~/Downloads/unity-trunk.mrs")))
>>
>> ("Ubuntu is dropping unity.")
>>
>> —i suspect you might not have run (lkb::index-for-generator) after
>> loading the grammar?
>>
>> best, oe
>>
>