-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
statistics: merge the partition-level histograms to a global-level histogram #22603
Conversation
/label status/WIP |
/unlabel status/WIP |
/label sig/planner |
statistics/histogram.go
Outdated
|
||
// mergeBucketNDV merges bucket NDV from tow bucket `b` & `left`. | ||
// Before merging, you need to make sure that when using (upper, lower) as the comparison key, `b` is greater than `left` | ||
func (b *bucket4Merging) mergeBucketNDV(sc *stmtctx.StatementContext, left *bucket4Merging) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changing this method to a statistics function like mergeBucketNDV(sc, left, right) (ndv, err)
, which is more easy-to-use and test?
Co-authored-by: Yuanjia Zhang <qw4990@163.com>
statistics/histogram.go
Outdated
// if the buckets have the same upper, we merge them into the same new buckets. | ||
for ; i > 0; i-- { | ||
res, err := buckets[i-1].upper.CompareDatum(sc, buckets[i].upper) | ||
if err == nil || res != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to handle this error?
{ | ||
lower: 6, | ||
upper: 9, | ||
count: 5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the meaning of the count
in h1Buckets
and h2Buckets
is different from the meaning of the count
in expHist
. The former is the cumulative value. Is this expected?
statistics/histogram.go
Outdated
} | ||
} | ||
|
||
// mergeBucketNDV merges bucket NDV from tow bucket `b` & `left`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does tow bucket
mean?
statistics/histogram.go
Outdated
return nil, err | ||
} | ||
// __right__| | ||
// -------left----| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the meaning of the dotted line(---) the same as the solid line(___)?
statistics/histogram.go
Outdated
// repeat = sum of buckets[i] (buckets[i].upper == global bucket.upper && i in [l...r)) | ||
// ndv = merge bucket ndv from r-1 to l by mergeBucketNDV | ||
// Notice: lower is not calculated here. | ||
func mergePartitionBuckets(sc *stmtctx.StatementContext, buckets []*bucket4Merging) *bucket4Merging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return the error message?
statistics/histogram.go
Outdated
// Notice: If expBucketNumber == 0, we will let expBucketNumber = max(hists.Len()) | ||
func MergePartitionHist2GlobalHist(sc *stmtctx.StatementContext, hists []*Histogram, expBucketNumber int64) (*Histogram, error) { | ||
var totCount, totNull, bucketNumber, totColSize int64 | ||
// minValue is used to calc the bucket lower. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to the place where minValue is defined.
statistics/histogram.go
Outdated
if !needBucketNumber { | ||
continue | ||
} | ||
if int64(hist.Len()) > expBucketNumber { | ||
expBucketNumber = int64(hist.Len()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can merge here to
if needBucketNumber && int64(hist.Len()) > expBucketNumber {
expBucketNumber = int64(hist.Len())
}
statistics/histogram.go
Outdated
continue | ||
} | ||
res, err := hist.GetLower(0).CompareDatum(sc, minValue) | ||
if err != nil && res < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to handle this error?
totCount += hist.Buckets[hist.Len()-1].Count | ||
if minValue == nil { | ||
minValue = hist.GetLower(0).Clone() | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the continue
occurs, the expBucketNumber
can not be updated.
} | ||
globalBuckets = append(globalBuckets, merged) | ||
r = i | ||
bucketCount++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set sum = 0
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to set sum = 0
, because we use sum >= totCount*bucketCount/expBucketNumber
as the judgment condition.
I addressed all your comments. @Reminiscent PTAL |
if len(buckets) == 0 { | ||
return nil | ||
return nil, errors.Errorf("not enough buckets to merge") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return the error here? Or check whether bucket4Merging
is nil in the calling function.
@@ -389,7 +389,13 @@ func (h *Handle) MergePartitionStats2GlobalStats(is infoschema.InfoSchema, physi | |||
} | |||
|
|||
// Merge histogram | |||
err = errors.Errorf("TODO: The merge function of the histogram structure has not been implemented yet") | |||
globalStats.Hg[i], err = statistics.MergePartitionHist2GlobalHist(sc, allHg[i], 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we explicitly set the number of histogram buckets (the third parameter of the MergePartitionHist2GlobalHist function)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/merge |
/run-all-tests |
@rebelice merge failed. |
Please update handler.go to let our stats-collection mechanism can use your algorithm. @rebelice |
/run-e2e-test |
What problem does this PR solve?
sub-PR for the PR #22472
Problem Summary:
We merge the partition-level histograms to a global-level histogram.
design doc: link
Related changes
pingcap/docs
/pingcap/docs-cn
:Check List
Tests
Side effects
Release note