[developers] RMRS characterization differences

Christopher Rupp Christopher.Rupp at cl.cam.ac.uk
Fri Jul 13 13:33:03 CEST 2007


Hi,

You raised the question of whether the character offsets in PET RMRS results
were being calculated correctly. As far as I can see there was no response
to this. Maybe the relevant people are not available at the moment. Since
I also have results from both PET and LKB I thought I'd look for discrepancies.
The context is slightly different, in that I am processing files of marked up
XML text and feed in the actual character positions in the file at the lexical
level, so the numbers are true for the file and not the string input. However,
I quite easily found an example that may be indicative. (I have a lot of 
outputs
and I've only looked at a few for this question.)

LKB:

<ep cfrom='31561' cto='31562'><realpred lemma='a' pos='q' /><label vid='8' />
<var sort='x' vid='4' /></ep>
<ep cfrom='31563' cto='31575'><realpred lemma='quantitative' pos='a' sense='1' 
/><label vid='10001' /><var sort='e' vid='11' /></ep>
<ep cfrom='31576' cto='31587'><realpred lemma='description' pos='n' sense='of' 
/><label vid='10002' /><var sort='x' vid='4' /></ep>

PET:

<ep cfrom='31561' cto='31605'><realpred lemma='a' pos='q'/><label vid='6'/>
<var sort='x' vid='5'/></ep>
<ep cfrom='31563' cto='31605'><realpred lemma='quantitative' pos='a' 
sense='1'/><label vid='9'/><var sort='e' vid='10'/></ep>
<ep cfrom='31576' cto='31587'><realpred lemma='description' pos='n' 
sense='of'/><label vid='10001'/><var sort='x' vid='5'/></ep>

Input:

 A quantitative description of the nucleation process as a function of 
temperature and Pd flux suggests that heterogeneous nucleation is the dominant 
process.

(31605 is the last 'n' in nucleation, but I can't tell you what the phrasal
analysis is.)

I won't claim that all of my versions are up to date. The aim is to provoke a
response, because I have an interest in the result. I hope that this will be
helpful in the end.

Cheers,

C.J.

-- 
Dr. C.J. Rupp

University of Cambridge Computer Laboratory
William Gates Building
15 JJ Thomson Avenue
Cambridge CB3 0FD, UK

Tel: +44 1223 767025 
Fax: +44 1223 334678
Mobile: +44 795-8496 916
(Home): +44 1223 721621
Email: cr351 at cl.cam.ac.uk





More information about the developers mailing list