Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistics: merge the partition-level histograms to a global-level histogram #22603

Merged
merged 12 commits into from
Feb 19, 2021

Conversation

rebelice
Copy link
Contributor

@rebelice rebelice commented Jan 28, 2021

What problem does this PR solve?

sub-PR for the PR #22472

Problem Summary:

We merge the partition-level histograms to a global-level histogram.

design doc: link

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM
  • Breaking backward compatibility

Release note

  • No release note

@rebelice rebelice requested a review from a team as a code owner January 28, 2021 10:27
@rebelice rebelice requested review from winoros and removed request for a team January 28, 2021 10:27
@rebelice
Copy link
Contributor Author

/label status/WIP

@rebelice
Copy link
Contributor Author

rebelice commented Feb 3, 2021

/unlabel status/WIP

@rebelice
Copy link
Contributor Author

rebelice commented Feb 3, 2021

/label sig/planner

@ti-srebot ti-srebot added the sig/planner SIG: Planner label Feb 3, 2021
@qw4990 qw4990 self-requested a review February 4, 2021 07:43
statistics/histogram.go Outdated Show resolved Hide resolved
statistics/histogram.go Outdated Show resolved Hide resolved
statistics/histogram.go Outdated Show resolved Hide resolved
statistics/histogram.go Outdated Show resolved Hide resolved
statistics/histogram.go Outdated Show resolved Hide resolved

// mergeBucketNDV merges bucket NDV from tow bucket `b` & `left`.
// Before merging, you need to make sure that when using (upper, lower) as the comparison key, `b` is greater than `left`
func (b *bucket4Merging) mergeBucketNDV(sc *stmtctx.StatementContext, left *bucket4Merging) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing this method to a statistics function like mergeBucketNDV(sc, left, right) (ndv, err), which is more easy-to-use and test?

Co-authored-by: Yuanjia Zhang <qw4990@163.com>
// if the buckets have the same upper, we merge them into the same new buckets.
for ; i > 0; i-- {
res, err := buckets[i-1].upper.CompareDatum(sc, buckets[i].upper)
if err == nil || res != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle this error?

@rebelice rebelice requested a review from a team as a code owner February 5, 2021 09:44
@github-actions github-actions bot added the sig/execution SIG execution label Feb 5, 2021
{
lower: 6,
upper: 9,
count: 5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the meaning of the count in h1Buckets and h2Buckets is different from the meaning of the count in expHist. The former is the cumulative value. Is this expected?

}
}

// mergeBucketNDV merges bucket NDV from tow bucket `b` & `left`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does tow bucket mean?

return nil, err
}
// __right__|
// -------left----|
Copy link
Contributor

@Reminiscent Reminiscent Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the meaning of the dotted line(---) the same as the solid line(___)?

@pingcap pingcap deleted a comment from rebelice Feb 18, 2021
// repeat = sum of buckets[i] (buckets[i].upper == global bucket.upper && i in [l...r))
// ndv = merge bucket ndv from r-1 to l by mergeBucketNDV
// Notice: lower is not calculated here.
func mergePartitionBuckets(sc *stmtctx.StatementContext, buckets []*bucket4Merging) *bucket4Merging {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return the error message?

// Notice: If expBucketNumber == 0, we will let expBucketNumber = max(hists.Len())
func MergePartitionHist2GlobalHist(sc *stmtctx.StatementContext, hists []*Histogram, expBucketNumber int64) (*Histogram, error) {
var totCount, totNull, bucketNumber, totColSize int64
// minValue is used to calc the bucket lower.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to the place where minValue is defined.

Comment on lines 1720 to 1725
if !needBucketNumber {
continue
}
if int64(hist.Len()) > expBucketNumber {
expBucketNumber = int64(hist.Len())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge here to

if needBucketNumber && int64(hist.Len()) > expBucketNumber {
    expBucketNumber = int64(hist.Len())
}

continue
}
res, err := hist.GetLower(0).CompareDatum(sc, minValue)
if err != nil && res < 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle this error?

totCount += hist.Buckets[hist.Len()-1].Count
if minValue == nil {
minValue = hist.GetLower(0).Clone()
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the continue occurs, the expBucketNumber can not be updated.

}
globalBuckets = append(globalBuckets, merged)
r = i
bucketCount++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set sum = 0 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to set sum = 0, because we use sum >= totCount*bucketCount/expBucketNumber as the judgment condition.

@rebelice
Copy link
Contributor Author

I addressed all your comments. @Reminiscent PTAL

if len(buckets) == 0 {
return nil
return nil, errors.Errorf("not enough buckets to merge")
Copy link
Contributor

@Reminiscent Reminiscent Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return the error here? Or check whether bucket4Merging is nil in the calling function.

@@ -389,7 +389,13 @@ func (h *Handle) MergePartitionStats2GlobalStats(is infoschema.InfoSchema, physi
}

// Merge histogram
err = errors.Errorf("TODO: The merge function of the histogram structure has not been implemented yet")
globalStats.Hg[i], err = statistics.MergePartitionHist2GlobalHist(sc, allHg[i], 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we explicitly set the number of histogram buckets (the third parameter of the MergePartitionHist2GlobalHist function)?

Copy link
Contributor

@Reminiscent Reminiscent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 19, 2021
qw4990
qw4990 previously approved these changes Feb 19, 2021
Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Feb 19, 2021
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Feb 19, 2021
@qw4990
Copy link
Contributor

qw4990 commented Feb 19, 2021

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Feb 19, 2021
@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot
Copy link
Contributor

@rebelice merge failed.

@qw4990 qw4990 dismissed their stale review February 19, 2021 02:56

need to update handler.go

@qw4990
Copy link
Contributor

qw4990 commented Feb 19, 2021

Please update handler.go to let our stats-collection mechanism can use your algorithm. @rebelice

@qw4990
Copy link
Contributor

qw4990 commented Feb 19, 2021

/run-e2e-test

@qw4990 qw4990 merged commit 59ccb29 into pingcap:master Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/statistics sig/execution SIG execution sig/planner SIG: Planner status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants