[developers] results after edge limit reached?
Rebecca Dridan
bec.dridan at gmail.com
Sat Aug 15 19:48:47 CEST 2009
Stephan Oepen wrote:
> hi again,
>
> are you quite sure of that observed asymmetry in PET behavior, rebecca?
I'm sure that the parse file I get when I run
cmcheap -tok=fsc -cm -default-les=all -nsolution -packing -timeout=60
-mrs -tsdbdump=. $LOGONROOT/lingo/terg
has -1 in the readings field for every item that has "timed out (60 s)"
in the error field. That's with chart mapping cheap svn rev598. I
wouldn't swear to anything else right now, and it's not a good time for
me to be chasing down the exact conditions, as long as I can work around
them :)
> even though i'm no fan of the -tsdbdump mode of operation (as it
> duplicates existing [incr tsdb()] internals in the PET code base), i was
> under the impression that it shared enough of the result reporting code
> with standard [incr tsdb()] client mode to make it unlikely that one
> would see different outcomes.
I had hoped so. I use -tsdbdump because I want fsc input - i believe
that is currently the only way to input fsc?
> in an application-oriented perspective, however, it might be tempting to
> count those partial solutions as coverage (in principle they should be
> qualified as owed to a robustness heuristics), and certainly substantial
> cpu time was expended on these items (just as our parsers tend to reject
> some inputs in zero time, e.g. ones exposing lexical gaps). in this
> respect, averaging over all inputs gives a more practical estimate of
> processing cost. for your thesis, i believe i would recommend you use
> this latter approach.
Actually, I think
(time spent on all items, including failed and errored)/
(number of items that get an analysis)
is the most meaningful figure in batch processing for an application,
but over all input items is also reasonable. The current [incr tsdb()]
uses "number of input items - items with lexical gaps" as denominator (I
think), which is neither here nor there. For the moment, I'm going to
pick a definition and stick with it, but I think if people are going to
be reporting times and memory use from [incr tsdb()], we should decide
what those figures measure.
Rebecca
More information about the developers
mailing list