[pet] xml_counts mode
Timothy Baldwin
tim at csse.unimelb.edu.au
Wed Feb 7 05:47:28 CET 2007
Hi all,
I have been playing around with -xml_counts mode and lexical type predictions,
and run into a slight problem with lexical rules (esp. inflectional
rules). What I want to be able to do is stipulate a set of lexical type(s) per
token and have the full lexical rule machinery kick in, conditioned on those
lexical types, e.g. something like the following:
<?xml version="1.0" encoding="utf-8" standalone="no" ?>
<!DOCTYPE pet-input-chart
SYSTEM "/usr/share/lkb/src/preprocess/maf/pic.dtd">
<pet-input-chart>
<!-- The chance to laugh comes about -->
<w id="W1" cstart="1" cend="4">
<surface>The</surface>
</w>
<w id="W2" cstart="6" cend="11">
<surface>chance</surface>
<typeinfo id="W2S1" baseform="no" prio="1.0">
<stem>$generic_n_vp_c_le</stem>
</typeinfo>
</w>
<w id="W3" cstart="13" cend="14">
<surface>to</surface>
</w>
<w id="W4" cstart="16" cend="20">
<surface>laugh</surface>
</w>
<w id="W5" cstart="22" cend="25">
<surface>comes</surface>
<typeinfo id="W5S1" baseform="no" prio="1.0">
<stem>$generic_v_p_le</stem>
</typeinfo>
</w>
<w id="W6" cstart="27" cend="31">
<surface>about</surface>
<typeinfo id="W6S1" baseform="no" prio="1.0">
<stem>$generic_p_np_ptcl_le</stem>
</typeinfo>
</w>
</pet-input-chart>
What I find I need to do in practice is:
<?xml version="1.0" encoding="utf-8" standalone="no" ?>
<!DOCTYPE pet-input-chart
SYSTEM "/usr/share/lkb/src/preprocess/maf/pic.dtd">
<pet-input-chart>
<!-- The chance to laugh comes about -->
<w id="W1" cstart="1" cend="4">
<surface>The</surface>
</w>
<w id="W2" cstart="6" cend="11">
<surface>chance</surface>
<typeinfo id="W2S1" baseform="no" prio="1.0">
<stem>$generic_n_vp_c_le</stem>
</typeinfo>
</w>
<w id="W3" cstart="13" cend="14">
<surface>to</surface>
</w>
<w id="W4" cstart="16" cend="20">
<surface>laugh</surface>
</w>
<w id="W5" cstart="22" cend="25">
<surface>comes</surface>
<typeinfo id="W5S1" baseform="no" prio="1.0">
<stem>$generic_v_p_le</stem>
</typeinfo>
<typeinfo id="W5S2" baseform="no" prio="1.0">
<stem>$generic_v_p_le</stem>
<infl name="$third_sg_fin_verb_orule"/>
</typeinfo>
</w>
<w id="W6" cstart="27" cend="31">
<surface>about</surface>
<typeinfo id="W6S1" baseform="no" prio="1.0">
<stem>$generic_p_np_ptcl_le</stem>
</typeinfo>
</w>
</pet-input-chart>
i.e. add in all the possible lexical rules that can apply to that lexical
type, or alternatively try to disambiguate which lexical rules apply to each
token (which I want to rely on the grammar to do for me). Have I perhaps
misunderstood the XML input formalism, or is there some magic trick on the PET
side of things that I need? For the record, this is the invocation of PET I am
using:
# cat input.xml | cheap -tok=xml_counts -packing /nlptools/erg/20060905-supertagging/english.grm
The version of PET I am using is 0.99.13, and the version of the ERG is
Jul-06, with some naive playing around with gen-lex.tdl to be able to specify
terminal lexical types.
Tim
More information about the pet
mailing list