[developers] Latest ERG not generating unknown names with ACE
sweaglesw at sweaglesw.org
Mon Apr 17 23:16:26 CEST 2017
The mechanism for those post generation token mapping rules is enabled
only when at least one rule is defined, so that makes sense.
During parsing, the link from the surface form of "Ubuntu" to the CARG
of the named_rel is a bit circuitous, and involves token mapping rules.
Those not generally being available at generation time, an alternate
strategy was in use in ACE, and I assume the LKB as well. In ACE's
case, I go to the trouble of manufacturing a new lexical entry structure
for easier bookkeeping when instantiating a generic lexical entry during
generation, and that structure has a slot separate from the feature
structure in which the (uninflected) orthography is stored for ease of
reference. The CARG was copied there when instantiating this type of
generic, and that same field was referenced when reading out realization
Enter the post-generation token mapping setup (which I believe would be
more correctly called post-generation lexeme mapping as currently
implemented...). The orthography read out in that mode comes from the
ORTH list on the feature structure (or value of lex-stem-path). I'm not
able at the moment to reconstruct why I set it up differently, but just
now it seems a perfectly reasonable place to go hunting for the surface
value. That value is stipulated as "_generic_proper_ne" or some such
for proper names, hence the trouble. I've committed a change to the ACE
trunk that causes the generic lexical entry instantiation for generation
mechanism to edit that ORTH list in addition to what it did previously
(i.e. just editing its quick-reference copy of that list), which seems
more consistent at least, if not fully above board. The change seems to
have the desired effect, i.e. "Ubuntu" comes out instead of
On 04/15/2017 04:28 PM, Dan Flickinger wrote:
> Woodley, your prediction was a good one. If I comment out the loading of that file in `english.tdl' and recompile, then the unknown proper names work right again, and similarly if I comment out each of the rules in the file but load it, all is still well. But if I uncomment any one of the rules, we lose the proper names again. So it would seem that the very act of tampering with the orthography in post-generation interacts badly with whatever the clever step is that causes the CARG value of the unknown proper name to be realized as its surface orthography.
> From: developers-bounces at emmtee.net <developers-bounces at emmtee.net> on behalf of Woodley Packard <sweaglesw at sweaglesw.org>
> Sent: Saturday, April 15, 2017 1:59 PM
> To: Stephan Oepen
> Cc: Michael Wayne Goodman; developers at delph-in.net
> Subject: Re: [developers] Latest ERG not generating unknown names with ACE
> I wonder whether something may be going on with the ACE-only post-generation token mapping rules? I believe Dan has started toying with those, although I lack the proper internet connection to investigate this hypothesis currently.
> On Apr 15, 2017, at 1:38 PM, Stephan Oepen <oe at ifi.uio.no> wrote:
>>> I attach the relevant MRSs (same sentence; one created with the ERG trunk
>>> and the other with the 1214 version).
>> these MRSs appear equivalent in content; the ERG trunk has not yet
>> turned on predicate normalization by default (because that switch also
>> turns on SEM-I–based MRS processing, and finalizing the SEM-I prior to
>> a release currently is a non-trivial process, hence not applied to the
>> trunk yet), hence the spurious string vs. type distinctions and _rel
>> suffixes on predicates. also, which engine did you use? it still
>> outputs old-style LTOP (which should be TOP nowadays).
>> anyway, the MRSs look fine and give the expected result in the LKB generator:
>> LKB(42): (pprint
>> (mrs::read-mrs-from-file "~/Downloads/unity-logon.mrs")))
>> ("Ubuntu is dropping unity.")
>> LKB(43): (pprint
>> (mrs::read-mrs-from-file "~/Downloads/unity-trunk.mrs")))
>> ("Ubuntu is dropping unity.")
>> —i suspect you might not have run (lkb::index-for-generator) after
>> loading the grammar?
>> best, oe
More information about the developers