statistics: fix repetitive selectivity accounting and stabilify the result (#15536) #16050
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
cherry-pick #15536 to release-2.1
What problem does this PR solve?
Problem Summary:
Selectivity
, index order incoll.Indices
is non-deterministic, so the greedy search algorithm may return different results in different runs, that would confuse users since the stats is not changed at all;t.a = 1 and t.b > 1 and t.c > 1
, and there are 2 indexesidx1(a,b)
andidx2(a,c)
, the greedy algorithm would choose both indexes and multiply their selectivity computed respectively. Obviously, this is wrong, because selectivity oft.a = 1
is accounted twice.What is changed and how it works?
What's Changed:
StatsNode
slice before greedy search;How it Works:
Note that, how we sort the
StatsNode
slice impacts the greedy search result. I put the PK in the end of the slice, indexes in the middle and columns in the front, to enforce the heuristic rule that, PK is preferred over indexes in estimation, and indexes are preferred over columns.Related changes
Check List
Tests
Side effects
compareType
function, instead of changing the values ofIndexType
/PkType
/ColType
, because feedback encoding uses these constants.Release note