MobilityDB 1.1

◆ span_joinsel_scalar()

static double span_joinsel_scalar ( const SpanBound hist1,
int  nhist1,
const SpanBound hist2,
int  nhist2,
bool  equal 
)
static

Given two histograms of span bounds, estimate the fraction of values in the first histogram that are less than (or equal to, if 'equal' argument is true) a value in the second histogram.

The join selectivity estimation for all span operators is expressed using this function.

The general idea is to iteratively decompose (var op var) into a summation of (var op const) using the span bounds present in first histogram as const. Then, (var op const) is calculated using the function span_sel_scalar, which estimates the restriction selectivity.

Consider two variables Var1 and Var2 whose distributions are given by hist1 and hist2, respectively. To estimate:

P(Var1 < Var2)

we need to compute for each bin in the first histogram

P(Var1 < Var2 | Var2 is in bin) * P(Var2 is in bin)

The second probability is the difference between the selectivity of the upper and lower bounds of the bin, which can be directly computed by calling span_sel_scalar.

The first probability, however, cannot be directly computed because we do not have a concrete value for Var2. Instead, we can under and over estimate it by respectively setting the value of Var2 to the lower and upper bound of the bin, then compute the average.

P(Var1 < lower bound of bin) <= P(Var1 < Var2 | Var2 is in bin) <= P(Var1 < upper bound of bin)

Therefore, we need to add the average selectivity of every bin, given by (val1 + val2) / 2 + (val2 + val3) / 2 + ... + (val_n-1 + val_n) / 2 which is equal to val1 / 2 + val2 + val3 + val4 + ... + val_n-1 + val_n / 2 The first and last terms above are computed out of the loop. The rest is computed in the loop.