[developers] tsql query to retrieve gold trees?

Woodley Packard sweaglesw at sweaglesw.org
Fri Jan 20 17:02:27 CET 2017


What Stephan says is right of course: there is no single tsql query if the profile was manually annotated, since multiple annotations may have occurred per item.  If the profile was freshly parsed and then automatically updated, I imagine this contingiency should not arise, so selecting just trees where t-active > 0 ought to work (or perhaps Stephan will correct me?).

For the fully general case where multiple annotations per item may exist, it may be possible to use one tsql query plus a clever combination of sort(1) and uniq(1), where the latter can be instructed to ignore some fields and will deterministically print just the first of a cohort.  This, however, gets a bit fiddly.

-Woodley

> On Jan 20, 2017, at 3:54 AM, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> hi ned,
> 
>> Is there a way to limit query to the actual trees that were selected as
>> being gold, or do I have to thin the profile first (or thin the tsdb output
>> in a post-processing step)?
> 
> i am afraid there is no single TSQL query that will achieve your goal,
> in part because of limitations in the query language, in part because
> pre-thinning (which includes the process that [incr tsdb()] calls
> normalization) the profiles (can) also contain versioning information
> of the annotation, i.e. re-annotating an item will just put in an
> additional layer of tuples (in the ‘tree’, ‘decision’, and
> ‘preference’ relations, if memory serves me right) and mark these as
> current.
> 
> practically, i would always recommend you work on thinned profiles,
> also because they are so much faster to turn around, i.e. one might go
> from a profile containing hundreds of thousands of derivations (e.g.
> 500 times 500) to one with less than a thousand.
> 
> if, for some reason, you feel strongly compelled to work on
> unnormalized profiles, you would have to replicate the nested queries
> used in the [incr tsdb()] annotation back-end (see browse-tree() in
> ‘redwoods.lisp’): for a given item, determine the current ‘t-version’
> value; then determine the ‘gold’ parse and result identifiers for this
> version in the ‘preference’ relation; finally, pull out the
> corresponding derivations.
> 
> all best, oe
> 




More information about the developers mailing list