Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: do not split excluded lower value ranges #12009

Merged
merged 3 commits into from
Sep 11, 2019
Merged

Conversation

alivxxx
Copy link
Contributor

@alivxxx alivxxx commented Sep 3, 2019

What problem does this PR solve?

Fix #11907

This bug would occur on indices that have multiply columns like index idx(a,b), and the bug happens when:

  • Queries that only request ranges on prefix like where a >= 10 are issued and collected by feedback, which result in one of the bucket upper bound becomes single encoded value (10), and there is a chance that next buckets's upper bound is (10, 11).

  • Then another queries comes like where a >= 9, so SplitRange is called with [9, +inf), so we will split by (10) and (10, 11), which result in ranges [9, 10], (10,(10,11)],..., and every thing looks fine now.

  • The caller, which is IndexRangesToKVRanges, will process the splited ranges (10,(10,11)]. Since lower is excluded, (10,(10,11)] will be transformed to [11, (10,12)) by using PrefixNext, so the invalid ranges happens.

What is changed and how it works?

When we split ranges, do not generate execluded lower ranges. This PR does it by split the ranges by lower bound and always generate included lower ranges and excluded upper ranges.

Check List

Tests

  • Unit test

Code changes

  • Has exported function/method change

Side effects

  • None

Related changes

  • Need to cherry-pick to the release branch

Release note

  • Write release note for bug-fix or new feature.

@codecov
Copy link

codecov bot commented Sep 3, 2019

Codecov Report

Merging #12009 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #12009   +/-   ##
===========================================
  Coverage   81.5929%   81.5929%           
===========================================
  Files           452        452           
  Lines         98060      98060           
===========================================
  Hits          80010      80010           
  Misses        12401      12401           
  Partials       5649       5649

Copy link
Member

@winoros winoros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we append MaxValueDatum/MinValueDatum when we found that the length of the upper and lower is not the same?

@alivxxx
Copy link
Contributor Author

alivxxx commented Sep 4, 2019

@winoros Main reason that I did not choose this solution is that there are old histograms and it is not easy to determine the original number of columns, because it may already been PrefixNext many times and it may not decodeable. To use this approach we need to increase the stats version and only use feedback on newly created histograms.

Copy link
Contributor

@lzmhhh123 lzmhhh123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lzmhhh123 lzmhhh123 added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 9, 2019
statistics/histogram.go Outdated Show resolved Hide resolved
@@ -766,11 +766,11 @@ func formatBuckets(hg *statistics.Histogram, lowBkt, highBkt, idxCols int) strin
return hg.BucketToString(lowBkt, idxCols)
}
if lowBkt+1 == highBkt {
return fmt.Sprintf("%s, %s", hg.BucketToString(lowBkt, 0), hg.BucketToString(highBkt, 0))
return fmt.Sprintf("%s, %s", hg.BucketToString(lowBkt, idxCols), hg.BucketToString(highBkt, idxCols))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without it, the result for index is unreadable.

Co-Authored-By: Kenan Yao <cauchy1992@gmail.com>
Copy link
Contributor

@eurekaka eurekaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eurekaka eurekaka added status/LGT2 Indicates that a PR has LGTM 2. status/can-merge Indicates a PR has been approved by a committer. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Sep 11, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Sep 11, 2019

/run-all-tests

@sre-bot sre-bot merged commit 440bb74 into pingcap:master Sep 11, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Sep 11, 2019

cherry pick to release-2.1 failed

@sre-bot
Copy link
Contributor

sre-bot commented Sep 11, 2019

cherry pick to release-3.0 failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/statistics status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug.
Projects
None yet
5 participants