[developers] RMRS characterization differences

Wed Jul 11 21:27:45 CEST 2007

Hello,

I am  using cheap to extract RMRSs  with characterization information.
I am  getting different results using  both the svn  revision of cheap
and lkb.  I will explain the issue with the following example. Parsing
the sentence:  "I show my work",  I get the  following character spans
using cheap:

<rmrs cfrom='-1' cto='-1'>
<label vid='1'/>
<ep cfrom='0' cto='14'><gpred>prop-or-ques_m_rel</gpred><label 
vid='1'/><var
sort='e' vid='2'/></ep>
<ep cfrom='0' cto='14'><gpred>pron_rel</gpred><label vid='6'/><var sort='x'
vid='7'/></ep>
<ep cfrom='0' cto='14'><gpred>pronoun_q_rel</gpred><label vid='8'/><var
sort='x' vid='7'/></ep>
<ep cfrom='2' cto='6'><realpred lemma='show' pos='v' sense='1'/><label
vid='11'/><var sort='e' vid='2'/></ep>
<ep cfrom='7' cto='14'><gpred>def_explicit_q_rel</gpred><label 
vid='13'/><var
sort='x' vid='12'/></ep>
<ep cfrom='7' cto='14'><gpred>poss_rel</gpred><label vid='16'/><var 
sort='e'
vid='18'/></ep>
<ep cfrom='7' cto='14'><gpred>pronoun_q_rel</gpred><label vid='19'/><var
sort='x' vid='17'/></ep>
<ep cfrom='7' cto='14'><gpred>pron_rel</gpred><label vid='22'/><var 
sort='x'
vid='17'/></ep>
<ep cfrom='10' cto='14'><realpred lemma='work' pos='n' sense='1'/><label
vid='10001'/><var sort='x' vid='12'/></ep>
[...]

So, the pron_rel, which would  correspond to "I", spans the range from
0  to 14.  On the  contrary, working  within lkb  I get  the following
result:

<rmrs cfrom='-1' cto='-1'>
<label vid='1'/>
<ep cfrom='0' cto='14'><gpred>prop-or-ques_m_rel</gpred><label 
vid='1'/><var
sort='e' vid='2'/></ep>
<ep cfrom='0' cto='1'><gpred>pron_rel</gpred><label vid='6'/><var sort='x'
vid='7'/></ep>
<ep cfrom='0' cto='1'><gpred>pronoun_q_rel</gpred><label vid='8'/><var
sort='x' vid='7'/></ep>
<ep cfrom='2' cto='6'><realpred lemma='show' pos='v' sense='1'/><label
vid='11'/><var sort='e' vid='2'/></ep>
<ep cfrom='7' cto='9'><gpred>def_explicit_q_rel</gpred><label 
vid='13'/><var
sort='x' vid='12'/></ep>
<ep cfrom='7' cto='9'><gpred>poss_rel</gpred><label vid='16'/><var sort='e'
vid='18'/></ep>
<ep cfrom='7' cto='9'><gpred>pronoun_q_rel</gpred><label vid='19'/><var
sort='x' vid='17'/></ep>
<ep cfrom='7' cto='9'><gpred>pron_rel</gpred><label vid='22'/><var sort='x'
vid='17'/></ep>
<ep cfrom='10' cto='14'><realpred lemma='work' pos='n' sense='1'/><label
vid='10001'/><var sort='x' vid='12'/></ep>
[...]

So, I  think the  latter is correct,  because pron_rel  corresponds to
"I".   The  character  spans  for  the "poss_rel"  relation  are  also
different. Might it be the case that there is a bug in cheap? There is
no  difference in the  outcomes with  relations like  "named_rel", for
example parsing "John shows the work", but still there are differences
within the lemma "the".

Thanks,

Sergio.