Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistics: fix repetitive selectivity accounting and stabilify the result #15536

Merged
merged 2 commits into from
Apr 3, 2020

Conversation

eurekaka
Copy link
Contributor

What problem does this PR solve?

Problem Summary:

  • in Selectivity, index order in coll.Indices is non-deterministic, so the greedy search algorithm may return different results in different runs, that would confuse users since the stats is not changed at all;
  • in the greedy search algorithm, there are repetitive selectivity accounting sometimes. For example, if the filter is like t.a = 1 and t.b > 1 and t.c > 1, and there are 2 indexes idx1(a,b) and idx2(a,c), the greedy algorithm would choose both indexes and multiply their selectivity computed respectively. Obviously, this is wrong, because selectivity of t.a = 1 is accounted twice.

What is changed and how it works?

What's Changed:

  • do not choose indexes whose filters covered are overlapped;
  • sort the StatsNode slice before greedy search;

How it Works:

Note that, how we sort the StatsNode slice impacts the greedy search result. I put the PK in the end of the slice, indexes in the middle and columns in the front, to enforce the heuristic rule that, PK is preferred over indexes in estimation, and indexes are preferred over columns.

Related changes

  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test

Side effects

  • Performance regression: the change of the selectivity result may change plan generated.
  • Breaking backward compatibility: nope, in order to keep compatibility, I introduces the compareType function, instead of changing the values of IndexType / PkType / ColType, because feedback encoding uses these constants.

Release note

@eurekaka eurekaka requested a review from a team as a code owner March 20, 2020 14:30
@ghost ghost requested review from francis0407 and winoros and removed request for a team March 20, 2020 14:30
// getUsableSetsByGreedy will select the indices and pk used for calculate selectivity by greedy algorithm.
func getUsableSetsByGreedy(nodes []*StatsNode) (newBlocks []*StatsNode) {
// GetUsableSetsByGreedy will select the indices and pk used for calculate selectivity by greedy algorithm.
func GetUsableSetsByGreedy(nodes []*StatsNode) (newBlocks []*StatsNode) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it impact the performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, because the input order is changed, so the result of greedy search may change.

statistics/selectivity.go Outdated Show resolved Hide resolved
statistics/selectivity.go Show resolved Hide resolved
Copy link
Member

@winoros winoros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@lzmhhh123 lzmhhh123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lzmhhh123 lzmhhh123 added status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. labels Apr 3, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Apr 3, 2020

/run-all-tests

@zz-jason
Copy link
Member

zz-jason commented Apr 3, 2020

do we have this problem in release 3.0 and release 3.1?

@sre-bot sre-bot merged commit cc24101 into pingcap:master Apr 3, 2020
@eurekaka eurekaka deleted the selectivity branch April 3, 2020 07:50
@eurekaka
Copy link
Contributor Author

eurekaka commented Apr 3, 2020

@zz-jason Yes

@sre-bot
Copy link
Contributor

sre-bot commented Apr 3, 2020

cherry pick to release-2.1 in PR #16050

@sre-bot
Copy link
Contributor

sre-bot commented Apr 3, 2020

cherry pick to release-3.0 in PR #16051

@sre-bot
Copy link
Contributor

sre-bot commented Apr 3, 2020

cherry pick to release-3.1 in PR #16052

eurekaka added a commit to sre-bot/tidb that referenced this pull request Apr 3, 2020
Signed-off-by: sre-bot <sre-bot@pingcap.com>
eurekaka added a commit to eurekaka/tidb that referenced this pull request Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/statistics status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants