-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv/kvserver: use LAI from before split when setting initialMaxClosed #46085
Merged
craig
merged 2 commits into
cockroachdb:master
from
nvanbenschoten:nvanbenschoten/closedTSSplit
Mar 13, 2020
Merged
kv/kvserver: use LAI from before split when setting initialMaxClosed #46085
craig
merged 2 commits into
cockroachdb:master
from
nvanbenschoten:nvanbenschoten/closedTSSplit
Mar 13, 2020
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ajwerner
approved these changes
Mar 13, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 3 of 3 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @danhhz and @tbg)
We used both of these interchangeably, which made working with this code a little difficult.
Addresses 1 of 2 bugs in cockroachdb#44878. Before this change, `TestKVNemesisMultiNode` with splits enabled and run with 3x increased frequency failed every ~2 minutes under stress. After this change, I have yet to see it fail after 20 minutes. The initialMaxClosed is assigned to the RHS replica to ensure that follower reads do not regress following the split. After the split occurs there will be no information in the closedts subsystem about the newly minted RHS range from its leaseholder's store. Furthermore, the RHS will have a lease start time equal to that of the LHS which might be quite old. This means that timestamps which follow the least StartTime for the LHS part are below the current closed timestamp for the LHS would no longer be readable on the RHS after the split. It is necessary for correctness that the call to maxClosed used to determine the current closed timestamp happens during the splitPreApply so that it uses a LAI that is _before_ the index at which this split is applied. If it were to refer to a LAI equal to or after the split then the value of initialMaxClosed might be unsafe. Concretely, any closed timestamp based on an LAI that is equal to or above the split index might be larger than the initial closed timestamp assigned to the RHS range's initial leaseholder. This is because the LHS range's leaseholder could continue closing out timestamps at the split's LAI after applying the split. Slow followers in that range could hear about these closed timestamp notifications before applying the split themselves. If these slow followers were allowed to pass these closed timestamps created after the split to the RHS replicas they create during the application of the split then these RHS replicas might end up with initialMaxClosed values above their current range's official closed timestamp. The leaseholder of the RHS range could then propose a write at a timestamp below this initialMaxClosed, violating the closed timestamp systems most important property. Using an LAI from before the index at which this split is applied avoids the hazard and ensures that no replica on the RHS is created with an initialMaxClosed that could be violated by a proposal on the RHS's initial leaseholder. See cockroachdb#44878. Release justification: fixes a high-severity bug in existing functionality which could cause CDC to violate its resolved timestamp guarantee. To some observers, this could look like data loss in CDC.
nvanbenschoten
force-pushed
the
nvanbenschoten/closedTSSplit
branch
from
March 13, 2020 21:27
701712e
to
9ce18bc
Compare
TFTR! I'll need to backport this to v19.2. bors r+ |
Build failed (retrying...) |
Build succeeded |
aayushshah15
added a commit
to aayushshah15/cockroach
that referenced
this pull request
Aug 4, 2020
We had merges disabled because of the bugs tracked in cockroachdb#44878, but those have since been fixed by cockroachdb#46085 and cockroachdb#50265. Release note: None
craig bot
pushed a commit
that referenced
this pull request
Aug 4, 2020
51305: geo,geoindex: use bounding box for geography coverings that are bad r=sumeerbhola a=sumeerbhola This change uses a covererInterface implementation for geography that notices when a covering is using top-level cells of all faces and in that case uses the bounding box to compute the covering. Also changed the bounding box calculation for geography shapes to use only the points and not first construct s2.Regions. The latter causes marginally bad shapes to continue to have bad coverings since the bounding box also covers the whole earth. Release note: None 51882: roachpb: panic when comparing a Lease to a non-lease r=andreimatei a=andreimatei Release note: None 52146: sql: remove local execution of projectSetNode and implement ConstructProjectSet in the new factory r=yuzefovich a=yuzefovich Depends on #52108. **sql: remove local execution of projectSetNode** We have project set processor which is always planned for `projectSetNode`, so this commit removes the dead code of its local execution. Additionally, it removes some unused fields and cleans up cancellation check of the processor. Release note: None **sql: implement ConstructProjectSet in the new factory** Addresses: #47473. Release note: None 52320: kvserver: enable merges in kvnemesis r=aayushshah15 a=aayushshah15 We had merges disabled because of the bugs tracked in #44878, but those have since been fixed by #46085 and #50265. Release note: None Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses 1 of 2 bugs in #44878.
Skews with #46017, so I'll let that go in first.
Before this change,
TestKVNemesisMultiNode
with splits enabled and run with 3x increased frequency failed every ~2 minutes under roachprod-stress. After this change, I have yet to see it fail after2040 minutes.The initialMaxClosed is assigned to the RHS replica to ensure that follower reads do not regress following the split. After the split occurs there will be no information in the closedts subsystem about the newly minted RHS range from its leaseholder's store. Furthermore, the RHS will have a lease start time equal to that of the LHS which might be quite old. This means that timestamps which follow the least StartTime for the LHS part are below the current closed timestamp for the LHS would no longer be readable on the RHS after the split.
It is necessary for correctness that the call to maxClosed used to determine the current closed timestamp happens during the splitPreApply so that it uses a LAI that is before the index at which this split is applied. If it were to refer to a LAI equal to or after the split then the value of initialMaxClosed might be unsafe.
Concretely, any closed timestamp based on an LAI that is equal to or above the split index might be larger than the initial closed timestamp assigned to the RHS range's initial leaseholder. This is because the LHS range's leaseholder could continue closing out timestamps at the split's LAI after applying the split. Slow followers in that range could hear about these closed timestamp notifications before applying the split themselves. If these slow followers were allowed to pass these closed timestamps created after the split to the RHS replicas they create during the application of the split then these RHS replicas might end up with initialMaxClosed values above their current range's official closed timestamp. The leaseholder of the RHS range could then propose a write at a timestamp below this initialMaxClosed, violating the closed timestamp systems most important property.
Using an LAI from before the index at which this split is applied avoids the hazard and ensures that no replica on the RHS is created with an initialMaxClosed that could be violated by a proposal on the RHS's initial leaseholder. See #44878.
Release justification: fixes a high-severity bug in existing functionality which could cause CDC to violate its resolved timestamp guarantee. To some observers, this could look like data loss in CDC.