[developers] PET XML input and [incr tsdb()]
Stephan Oepen
oe at csli.Stanford.EDU
Sun Feb 11 20:36:40 CET 2007
hi rebecca and francis,
> I think if you create the xml without any end-of-line markers and
> whack it into the right field of the item file, instead of the plain
> sentence, then call a tsdb cpu with the right options to cheap it will
> just work [...]
to [incr tsdb()], the `i-input' field is just a string that is usually
sent to the client processor (PET in your case) verbatim. hence, i too
think the procedure suggested by francis should work. however, a small
number of characters in the `i-input' field require escaping:
@ --> \s
newline --> \n
backslash --> \\
also, it will be important to have everyone agree on encodings, viz.
- the raw tsdb(1) data files;
- excl:*locale* in the [incr tsdb()] universe;
- pvm:*pvm-encoding* (`nil' means default to *locale* value); and
- the `encoding' parameter in the grammar configuration for PET.
but given the sample item you sent, and assuming you are using the ERG,
i suspect neither of the above are part of the problem you report.
to debug further, could you please do the following:
:trace tsdb::retrieve tsdb::create-runs tsdb::process-item
(setf tsdb:*pvm-debug-p* t)
and then post the complete contents of the Lisp console output?
> (although tsdb's word counting will get confused).
not really, i would think. assuming one creates the `item' relation in
a process external to [incr tsdb()], it would seem reasonable to expect
that the `i-length' field provides the correct value.
finally, out of curiosity: what is the XML flexibility in PIC that you
lack in the YY format? the latter is more tested in [incr tsdb()], and
beyond the level of passing through `i-input' verbatim there is support
in [incr tsdb()] for interpreting YY format but not PIC. hence, i tend
to recommend YY format, unless of course there are things you cannot do
there?
all best - oe
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++ --- oe at csli.stanford.edu; oe at ifi.uio.no; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
More information about the developers
mailing list