[developers] PET XML input and [incr tsdb()]

Stephan Oepen oe at csli.Stanford.EDU
Sun Feb 11 20:36:40 CET 2007


hi rebecca and francis,

> I think if you create the xml without any end-of-line markers and
> whack it into the right field of the item file, instead of the plain
> sentence, then call a tsdb cpu with the right options to cheap it will
> just work [...]

to [incr tsdb()], the `i-input' field is just a string that is usually
sent to the client processor (PET in your case) verbatim.  hence, i too
think the procedure suggested by francis should work.  however, a small
number of characters in the `i-input' field require escaping:

  @         --> \s
  newline   --> \n
  backslash --> \\

also, it will be important to have everyone agree on encodings, viz.

  - the raw tsdb(1) data files;
  - excl:*locale* in the [incr tsdb()] universe;
  - pvm:*pvm-encoding* (`nil' means default to *locale* value); and
  - the `encoding' parameter in the grammar configuration for PET.

but given the sample item you sent, and assuming you are using the ERG,
i suspect neither of the above are part of the problem you report.

to debug further, could you please do the following:

  :trace tsdb::retrieve tsdb::create-runs tsdb::process-item
  (setf tsdb:*pvm-debug-p* t)

and then post the complete contents of the Lisp console output?

> (although tsdb's word counting will get confused).

not really, i would think.  assuming one creates the `item' relation in
a process external to [incr tsdb()], it would seem reasonable to expect
that the `i-length' field provides the correct value.

finally, out of curiosity: what is the XML flexibility in PIC that you
lack in the YY format?  the latter is more tested in [incr tsdb()], and
beyond the level of passing through `i-input' verbatim there is support
in [incr tsdb()] for interpreting YY format but not PIC.  hence, i tend
to recommend YY format, unless of course there are things you cannot do
there?

                                                      all best  -  oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at csli.stanford.edu; oe at ifi.uio.no; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



More information about the developers mailing list