[developers] PET XML input and [incr tsdb()]

Berthold Crysmann crysmann at dfki.de
Wed Feb 14 14:40:55 CET 2007


On Wed, 2007-02-14 at 11:54 +0100, Bernd Kiefer wrote:
> Hi,
> 
> Uli just motivated me to read all these mails about xml-count. Sorry
> for my late reaction. He also noticed that your xml data is missing a
> pointer to the DTD, which is fatal since the parser always wants to do
> validation. So you have to put the DTD into some fixed point in your
> file system and put into your XML header something like:
> 
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <!DOCTYPE pet-input-chart
>   SYSTEM "/home/cl-home/kiefer/src/pet/main/cheap/sax/pic.dtd">
> etc.
> 
> I attach what i hope is the current DTD to this mail.
> 
> Greetings,
>                 Bernd
> 

There's yet another way to do the batch parsing with xml_counts:

Pipe the PIC to cheap and dump a profile using -tsdbdump. Items are the
readable items, and you get trees and mrs, whatever you want. 

E.g. 

cat my_beautiful_pic|cheap -limit=100000 -tok=xml_counts -default-les
-packing=15 -mrs -tsdbdump=. GRMFILE

Then create  a profile in [incr tsdb()] and copy item, parse, result,
and relations over. 

Works for me. 

Make sure you've got a recent SVN-Pet, otherwise you don't have a
matching relations file....

Cheers,

B  


> -- 
> In a world without walls and fences, who needs Windows or Gates?
> 
> **********************************************************************
> Bernd Kiefer                                            Am Blauberg 16
> kiefer at dfki.de                                      66119 Saarbruecken
> +49-681/302-5301 (office)                      +49-681/3904507  (home)
> **********************************************************************





More information about the developers mailing list