-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor: support building stats for fast analyze. #10258
executor: support building stats for fast analyze. #10258
Conversation
@lzmhhh123 Don't forget to add label for the pull request. |
please fix ci |
…hhh123/tidb into dev/fast_analyze_build_stats
executor/analyze.go
Outdated
rowCount := mathutil.MinInt64(domain.GetDomain(e.ctx).StatsHandle().GetTableStats(e.tblInfo).Count, int64(e.rowCount)) | ||
// build CMSketch | ||
var ndv, scaleRatio uint64 | ||
collector.CMSketch, ndv, scaleRatio = statistics.NewCMSketchWithTopN(defaultCMSketchDepth, defaultCMSketchWidth, data, uint32(len(data)), uint64(rowCount)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use len(data)
as topN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea about the number of topN. So just keep the len(data)
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erjiaqing PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use a relative small number, maybe 20 or 40.
Finding element in TopN is not cheap, we should keep size of TopN relative small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a constant ok? Or can we calculate it by the sample size?
/rebuild |
6 similar comments
/rebuild |
/rebuild |
/rebuild |
/rebuild |
/rebuild |
/rebuild |
/rebuild |
1 similar comment
/rebuild |
…hhh123/tidb into dev/fast_analyze_build_stats
Codecov Report
@@ Coverage Diff @@
## master #10258 +/- ##
===========================================
Coverage ? 77.8228%
===========================================
Files ? 410
Lines ? 84578
Branches ? 0
===========================================
Hits ? 65821
Misses ? 13842
Partials ? 4915 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/rebuild |
1 similar comment
/rebuild |
/rebuild |
1 similar comment
/rebuild |
} | ||
// build CMSketch | ||
var ndv, scaleRatio uint64 | ||
collector.CMSketch, ndv, scaleRatio = statistics.NewCMSketchWithTopN(defaultCMSketchDepth, defaultCMSketchWidth, data, 20, uint64(rowCount)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make 20 a constant variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. We keep it constant now. And modify it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
What problem does this PR solve?
After the fast sample, we need to build stats base on the samples.
What is changed and how it works?
Calculate stats by samples.
Check List
Tests
Code changes
Side effects
Related changes