[developers] PET XML input and [incr tsdb()]
Berthold Crysmann
crysmann at dfki.de
Wed Feb 14 14:40:55 CET 2007
On Wed, 2007-02-14 at 11:54 +0100, Bernd Kiefer wrote:
> Hi,
>
> Uli just motivated me to read all these mails about xml-count. Sorry
> for my late reaction. He also noticed that your xml data is missing a
> pointer to the DTD, which is fatal since the parser always wants to do
> validation. So you have to put the DTD into some fixed point in your
> file system and put into your XML header something like:
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <!DOCTYPE pet-input-chart
> SYSTEM "/home/cl-home/kiefer/src/pet/main/cheap/sax/pic.dtd">
> etc.
>
> I attach what i hope is the current DTD to this mail.
>
> Greetings,
> Bernd
>
There's yet another way to do the batch parsing with xml_counts:
Pipe the PIC to cheap and dump a profile using -tsdbdump. Items are the
readable items, and you get trees and mrs, whatever you want.
E.g.
cat my_beautiful_pic|cheap -limit=100000 -tok=xml_counts -default-les
-packing=15 -mrs -tsdbdump=. GRMFILE
Then create a profile in [incr tsdb()] and copy item, parse, result,
and relations over.
Works for me.
Make sure you've got a recent SVN-Pet, otherwise you don't have a
matching relations file....
Cheers,
B
> --
> In a world without walls and fences, who needs Windows or Gates?
>
> **********************************************************************
> Bernd Kiefer Am Blauberg 16
> kiefer at dfki.de 66119 Saarbruecken
> +49-681/302-5301 (office) +49-681/3904507 (home)
> **********************************************************************
More information about the developers
mailing list