[developers] Punctuation and "-default-les" type mapping in PET/ERG
Christopher Rupp
Christopher.Rupp at cl.cam.ac.uk
Fri Mar 28 13:24:29 CET 2008
Hi,
As I think I mentioned, I was getting WARNING messages pretty similar to the
ones
Richard showed, when using -default-les as a PET option with a smaf-conf file
and posmapping settings. I managed to suppress these messages by modifying the
definitions in the smaf.conf file, but I think most of the changes removed
information. (I did put some information back via gen-lex entries, but that's
not a direct fix.)
The biggest difference I got was from:
< pos.[tag='JJ'] -> gMap='aj_-_i-unk_le'
< pos.[tag='JA'] -> gMap='aj_-_i-unk_le'
< pos.[tag='JB'] -> gMap='aj_-_i-unk_le'
< pos.[tag='RR'] -> gMap='aj_-_i-unk_le'
< pos.[tag='JBR'] -> gMap='aj_-_i-cmp-unk_le'
< pos.[tag='JBT'] -> gMap='aj_-_i-sup-unk_le'
< pos.[tag='JJT'] -> gMap='aj_-_i-sup-unk_le'
< pos.[tag='NN'] -> gMap='n_-_m-unk_le'
< pos.[tag='NN1'] -> gMap='n_-_c-sg-unk_le'
< pos.[tag='NN2'] -> gMap='n_-_c-pl-unk_le'
< pos.[tag='NP1'] -> gMap='n_-_pn-unk_le'
< pos.[tag='NP2'] -> gMap='n_-_pn-unk_le'
< pos.[tag='NNSB'] -> gMap='n_-_c-tt-unk_le'
< pos.[tag='NNSB1'] -> gMap='n_-_c-tt-unk_le'
< pos.[tag='NNSB2'] -> gMap='n_-_c-tt-unk_le'
< pos.[tag='VV0'] -> gMap='v_np*_bse-unk_le'
< ;pos.[tag='VV0'] -> gMap='v_np*_pr-n3s-unk_le'
< pos.[tag='VVD'] -> gMap='v_np*_pa-unk_le'
< ;pos.[tag='VVD'] -> gMap='v_np*_psp-unk_le'
< pos.[tag='VVN'] -> gMap='v_np*_psp-unk_le'
< ;pos.[tag='VVN'] -> gMap='v_np*_pa-unk_le'
< pos.[tag='VVG'] -> gMap='v_np*_prp-unk_le'
< pos.[tag='VVZ'] -> gMap='v_np*_pr-3s-unk_le'
---
> pos.[tag='JJ'] -> gMap.type='aj_-_i-unk_le'
> pos.[tag='JA'] -> gMap.type='aj_-_i-unk_le'
> pos.[tag='JB'] -> gMap.type='aj_-_i-unk_le'
> pos.[tag='RR'] -> gMap.type='aj_-_i-unk_le'
> pos.[tag='JBR'] -> gMap.type='aj_-_i-cmp-unk_le'
> pos.[tag='JBT'] -> gMap.type='aj_-_i-sup-unk_le'
> pos.[tag='JJT'] -> gMap.type='aj_-_i-sup-unk_le'
> pos.[tag='NN'] -> gMap.type='n_-_m-unk_le'
> pos.[tag='NN1'] -> gMap.type='n_-_c-sg-unk_le'
> pos.[tag='NN2'] -> gMap.type='n_-_c-pl-unk_le'
> pos.[tag='NP1'] -> gMap.type='n_-_pn-unk_le'
> pos.[tag='NP2'] -> gMap.type='n_-_pn-unk_le'
> pos.[tag='NNSB'] -> gMap.type='n_-_c-tt-unk_le'
> pos.[tag='NNSB1'] -> gMap.type='n_-_c-tt-unk_le'
> pos.[tag='NNSB2'] -> gMap.type='n_-_c-tt-unk_le'
> pos.[tag='VV0'] -> gMap.type='v_np*_bse-unk_le'
> ;pos.[tag='VV0'] -> gMap.type='v_np*_pr-n3s-unk_le'
> pos.[tag='VVD'] -> gMap.type='v_np*_pa-unk_le'
> ;pos.[tag='VVD'] -> gMap.type='v_np*_psp-unk_le'
> pos.[tag='VVN'] -> gMap.type='v_np*_psp-unk_le'
> ;pos.[tag='VVN'] -> gMap.type='v_np*_pa-unk_le'
> pos.[tag='VVG'] -> gMap.type='v_np*_prp-unk_le'
> pos.[tag='VVZ'] -> gMap.type='v_np*_pr-3s-unk_le'
i.e. removing the ".type" in the smaf.conf rules for the tags. This caused
apparent conflicts despite the setting:
define gMap.type ()
in the smaf.conf file. The corresponding posmapping content is:
JJ $generic_adj
JA $generic_adj
JB $generic_adj
JBR $generic_adj_compar
JBT $generic_adj_superl
JJT $generic_adj_superl
NN $generic_mass_noun
NN1 $generic_sg_noun
NN2 $generic_pl_noun
NP1 $genericname
NP2 $genericname
NNSB $generic_title_noun
NNSB1 $generic_title_noun
NNSB2 $generic_title_noun
RR $generic_adverb
VV0 $generic_trans_verb_bse
VV0 $generic_trans_verb_presn3sg
VVD $generic_trans_verb_past
VVD $generic_trans_verb_psp
VVN $generic_trans_verb_psp
VVN $generic_trans_verb_past
VVG $generic_trans_verb_prp
VVZ $generic_trans_verb_pres3sg
Which should be compatible with those types.
I also have rules for handling Oscar edges (Chemistry NER). These are now
reduced
to:
oscar.[] -> edgeType='tok+morph' pos=content.type tokenStr=content.surface
oscar.[type='CM'] -> gMap='n_-_pn-unk_le'
oscar.[type='CJ'] -> gMap='aj_-_i-unk_le'
The original versions included setting the CARG values:
oscar.[] -> edgeType='tok+morph' pos=content.type gMap.carg=content.surface
Which appeared to produce warnings like this:
; WARNING: failed to create dag for new path-value ("2,2-dipyridylamine =
"2,2-dipyridylamine")
Does that say the string isn't equal to itself or "you can't have the feature
'foo'
with the value 'foo'"?
Although I've found examples of smaf.conf rules on the Wiki there's no stepwise
documentation of what the contructs mean. So I can't figure out what really
changed
but the original rules had been around for a while. It wasn't obvious that they
were failing before. Maybe there is a conflict in having both the smaf.conf
rules
and the posmapping. I think some of the warnings may have been genuine
conflicts
between the rule content and current type definitions, but they weren't all and
some things appear to be no longer expressible in the rules, e.g. linking the
token string to the CARG value.
I hope this provides some information. I wouldn't like to regard my current fix
as a solution, because I need to know what we can currently say in smaf.conf
rules
to know if this is adequate for integration of information from external
taggers
and NER component via the SMAF interface.
Cheers,
C.J.
More information about the developers
mailing list