-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plan, stats: fix inconsistent row count estimation #7233
Conversation
statistics/selectivity.go
Outdated
// (1): The stats type, always prefer the primary key or index. | ||
// (2): The number of expression that it covers, the more the better. | ||
// (3): The number of columns that it contains, the less the better. | ||
if (bestTp == colType && set.tp < colType) || bestCount < bits || (bestCount == bits && bestNumCols > set.numCols) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is required by this PR, because after the change in logical_plans.go
, it would cause failure in TestIndexRead, it would sometimes choose index (b,c)
.
executor/analyze.go
Outdated
@@ -41,11 +41,13 @@ type AnalyzeExec struct { | |||
tasks []*analyzeTask | |||
} | |||
|
|||
// MaxBucketSize is the maximum number of bucket that a histogram could contain. | |||
var MaxBucketSize = int64(256) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good design. Do not export a variable.
executor/analyze.go
Outdated
@@ -357,3 +358,13 @@ func (e *AnalyzeColumnsExec) buildStats() (hists []*statistics.Histogram, cms [] | |||
} | |||
return hists, cms, nil | |||
} | |||
|
|||
// SetMaxBucketSize sets the `maxBucketSize`. | |||
func SetMaxBucketSize(size int64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only used in test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
statistics/selectivity.go
Outdated
// (1): The stats type, always prefer the primary key or index. | ||
// (2): The number of expression that it covers, the more the better. | ||
// (3): The number of columns that it contains, the less the better. | ||
if (bestTp == colType && set.tp < colType) || bestCount < bits || (bestCount == bits && bestNumCols > set.numCols) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think set.tp != colType
is better, because the type
is not a scalar.
@@ -35,6 +35,8 @@ type exprSet struct { | |||
mask int64 | |||
// ranges contains all the ranges we got. | |||
ranges []*ranger.Range | |||
// numCols is the number of columns contained in the index or column(which is always 1). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is always 1, why don't use a const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is always 1 for the column, while it could also greater than 1 for the index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What have you changed? (mandatory)
Sometimes, the estimated count of a smaller set of expression could be smaller than that of a superset, and it is not consistent. This PR fixes it by preferring the estimation of the superset because it could use more stats info.
What is the type of the changes? (mandatory)
How has this PR been tested? (mandatory)
Unit test.
Does this PR affect documentation (docs/docs-cn) update? (mandatory)
No.
Does this PR affect tidb-ansible update? (mandatory)
No.
Does this PR need to be added to the release notes? (mandatory)
No.
Refer to a related PR or issue link (optional)
Benchmark result if necessary (optional)
Add a few positive/negative examples (optional)
PTAL @coocood @zz-jason @winoros