[developers] Punctuation and "-default-les" type mapping in PET/ERG

Christopher Rupp Christopher.Rupp at cl.cam.ac.uk
Fri Mar 28 13:24:29 CET 2008


As I think I mentioned, I was getting WARNING messages pretty similar to the 
Richard showed, when using -default-les as a PET option with a smaf-conf file
and posmapping settings. I managed to suppress these messages by modifying the
definitions in the smaf.conf file, but I think most of the changes removed
information. (I did put some information back via gen-lex entries, but that's
not a direct fix.)

The biggest difference I got was from:

< pos.[tag='JJ'] -> gMap='aj_-_i-unk_le'
< pos.[tag='JA'] -> gMap='aj_-_i-unk_le'
< pos.[tag='JB'] -> gMap='aj_-_i-unk_le'
< pos.[tag='RR'] -> gMap='aj_-_i-unk_le'
< pos.[tag='JBR'] -> gMap='aj_-_i-cmp-unk_le'
< pos.[tag='JBT'] -> gMap='aj_-_i-sup-unk_le'
< pos.[tag='JJT'] -> gMap='aj_-_i-sup-unk_le'
< pos.[tag='NN'] -> gMap='n_-_m-unk_le'
< pos.[tag='NN1'] -> gMap='n_-_c-sg-unk_le'
< pos.[tag='NN2'] -> gMap='n_-_c-pl-unk_le'
< pos.[tag='NP1'] -> gMap='n_-_pn-unk_le'
< pos.[tag='NP2'] -> gMap='n_-_pn-unk_le'
< pos.[tag='NNSB'] -> gMap='n_-_c-tt-unk_le'
< pos.[tag='NNSB1'] -> gMap='n_-_c-tt-unk_le'
< pos.[tag='NNSB2'] -> gMap='n_-_c-tt-unk_le'
< pos.[tag='VV0'] -> gMap='v_np*_bse-unk_le'
< ;pos.[tag='VV0'] -> gMap='v_np*_pr-n3s-unk_le'
< pos.[tag='VVD'] -> gMap='v_np*_pa-unk_le'
< ;pos.[tag='VVD'] -> gMap='v_np*_psp-unk_le'
< pos.[tag='VVN'] -> gMap='v_np*_psp-unk_le'
< ;pos.[tag='VVN'] -> gMap='v_np*_pa-unk_le'
< pos.[tag='VVG'] -> gMap='v_np*_prp-unk_le'
< pos.[tag='VVZ'] -> gMap='v_np*_pr-3s-unk_le'
> pos.[tag='JJ'] -> gMap.type='aj_-_i-unk_le'
> pos.[tag='JA'] -> gMap.type='aj_-_i-unk_le'
> pos.[tag='JB'] -> gMap.type='aj_-_i-unk_le'
> pos.[tag='RR'] -> gMap.type='aj_-_i-unk_le'
> pos.[tag='JBR'] -> gMap.type='aj_-_i-cmp-unk_le'
> pos.[tag='JBT'] -> gMap.type='aj_-_i-sup-unk_le'
> pos.[tag='JJT'] -> gMap.type='aj_-_i-sup-unk_le'
> pos.[tag='NN'] -> gMap.type='n_-_m-unk_le'
> pos.[tag='NN1'] -> gMap.type='n_-_c-sg-unk_le'
> pos.[tag='NN2'] -> gMap.type='n_-_c-pl-unk_le'
> pos.[tag='NP1'] -> gMap.type='n_-_pn-unk_le'
> pos.[tag='NP2'] -> gMap.type='n_-_pn-unk_le'
> pos.[tag='NNSB'] -> gMap.type='n_-_c-tt-unk_le'
> pos.[tag='NNSB1'] -> gMap.type='n_-_c-tt-unk_le'
> pos.[tag='NNSB2'] -> gMap.type='n_-_c-tt-unk_le'
> pos.[tag='VV0'] -> gMap.type='v_np*_bse-unk_le'
> ;pos.[tag='VV0'] -> gMap.type='v_np*_pr-n3s-unk_le'
> pos.[tag='VVD'] -> gMap.type='v_np*_pa-unk_le'
> ;pos.[tag='VVD'] -> gMap.type='v_np*_psp-unk_le'
> pos.[tag='VVN'] -> gMap.type='v_np*_psp-unk_le'
> ;pos.[tag='VVN'] -> gMap.type='v_np*_pa-unk_le'
> pos.[tag='VVG'] -> gMap.type='v_np*_prp-unk_le'
> pos.[tag='VVZ'] -> gMap.type='v_np*_pr-3s-unk_le'

i.e. removing the ".type" in the smaf.conf rules for the tags. This caused
apparent conflicts despite the setting:

define gMap.type ()

in the smaf.conf file. The corresponding posmapping content is:

  JJ $generic_adj
  JA $generic_adj
  JB $generic_adj
  JBR $generic_adj_compar
  JBT $generic_adj_superl
  JJT $generic_adj_superl
  NN $generic_mass_noun
  NN1 $generic_sg_noun
  NN2 $generic_pl_noun
  NP1 $genericname
  NP2 $genericname
  NNSB $generic_title_noun
  NNSB1 $generic_title_noun
  NNSB2 $generic_title_noun
  RR $generic_adverb
  VV0 $generic_trans_verb_bse
  VV0 $generic_trans_verb_presn3sg
  VVD $generic_trans_verb_past
  VVD $generic_trans_verb_psp
  VVN $generic_trans_verb_psp
  VVN $generic_trans_verb_past
  VVG $generic_trans_verb_prp
  VVZ $generic_trans_verb_pres3sg

Which should be compatible with those types.

I also have rules for handling Oscar edges (Chemistry NER). These are now 

oscar.[] -> edgeType='tok+morph' pos=content.type tokenStr=content.surface

oscar.[type='CM'] -> gMap='n_-_pn-unk_le'
oscar.[type='CJ'] -> gMap='aj_-_i-unk_le'

The original versions included setting the CARG values:

oscar.[] -> edgeType='tok+morph' pos=content.type gMap.carg=content.surface

Which appeared to produce warnings like this:

; WARNING: failed to create dag for new path-value ("2,2-dipyridylamine = 

Does that say the string isn't equal to itself or "you can't have the feature 
with the value 'foo'"?

Although I've found examples of smaf.conf rules on the Wiki there's no stepwise
documentation of what the contructs mean. So I can't figure out what really 
but the original rules had been around for a while. It wasn't obvious that they
were failing before. Maybe there is a conflict in having both the smaf.conf 
and the posmapping. I think some of the warnings may have been genuine 
between the rule content and current type definitions, but they weren't all and
some things appear to be no longer expressible in the rules, e.g. linking the
token string to the CARG value.

I hope this provides some information. I wouldn't like to regard my current fix
as a solution, because I need to know what we can currently say in smaf.conf 
to know if this is adequate for integration of information from external 
and NER component via the SMAF interface.



More information about the developers mailing list