[developers] [pet] xml_counts mode

Francis Bond fcbond at gmail.com
Wed Feb 14 07:54:13 CET 2007


> A related question, hopefully with a simpler answer: is there a batch input
> mode in xml_counts? The wiki suggests there is a way of specifying a file
> containing a list of file names, each of which in turn contains a single
> PIC. I couldn't find any documentation of such a facility or find any obvious
> sign of such a thing in the source code, but would dearly like to get away
> from my current mode of operation, of firing up PET each time I want to parse
> a single PIC.

We used the batch input mode at NTT.  If you switch on -tok=xml_counts
and the first string is not a valid XML header, then pet assumes that
it is a list of files, and processes them in order.  Each file must be
valid xml, with two new lines at the end.

Francis Bond <http://www2.nict.go.jp/x/x161/en/member/bond/>
NICT Computational Linguistics Group

More information about the developers mailing list