[developers] Punctuation and "-default-les" type mapping in PET/ERG

Christopher Rupp Christopher.Rupp at cl.cam.ac.uk
Mon Apr 14 15:49:23 CEST 2008

Hi Richard,

I don't know if this is getting too parochial now, but I'll still address the
developers list too.

So you've got a rule for 'pos' annotation in SMAF:

pos.[] -> edgeType='morph' fallback='' pos=content.tag gMap.carg=deps.content

which addresses a 'carg' construct, a 'carg' definition which sets a path:

define gMap.carg (synsem lkeys keyrel carg) STRING

A SMAF containing 'pos' and 'token' annotations:

<?xml version='1.0' encoding='UTF-8'?>
   <lattice init="v0" final="v4">
     <edge type="token" id="t1" source="v0" target="v1" cfrom="0" cto="8">
     <edge type="token" id="t2" source="v1" target="v2" cfrom="9" cto="13">
     <edge type="token" id="t3" source="v2" target="v3" cfrom="14" cto="16">
     <edge type="token" id="t4" source="v3" target="v4" cfrom="17" cto="21">
     <edge type="pos" id="p0" source="v0" target="v1" cfrom="0" cto="8" 
       <slot name="tag">NN2</slot>
     <edge type="pos" id="p1" source="v1" target="v2" cfrom="9" cto="13" 
       <slot name="tag">VH0</slot>
     <edge type="pos" id="p2" source="v2" target="v3" cfrom="14" cto="16" 
       <slot name="tag">AT</slot>
     <edge type="pos" id="p3" source="v3" target="v4" cfrom="17" cto="21" 
       <slot name="tag">NN1</slot>

And a partial posmapping which defines types containing the 'carg' path for 
two of
the four tokens in the input. And the issue is what happens when no posmapping
is defined for the 'pos' input? (Leaving aside for the moment the
"Reptiles"/Reptiles mismatch.)

As the lexicon is probably the only other place information can come from, it 
like there's a conflict between the SMAF rule and the lexicon content. In this
case, that's probably what you want, provided the lexical entry is able to
ignore the input from the SMAF rule. More generally, there may be cases where 
want to merge information into the lexicon entry, in the same way that you want
to merge the token specific information into the generic lexical entry spawned
by the posmapping rules that are defined.

So does this input get an analysis in which at least one of the generic lexical
entries gets instantiated correctly? (Or is all the SMAF input superceded by
the lexicon?) I'm just speculating here and I'd much rather not have to. While
this is a reasonable abductive argument, it ought to be confounded by the fact 
my posmapping is a bit different but still partial, but I was able get get rid 
the warnings.

I really just want to understand what is going on here.



More information about the developers mailing list