release-22.1.0: sql,migration: ensure cluster version never regresses#80712
Merged
ajwerner merged 1 commit intocockroachdb:release-22.1.0from Apr 28, 2022
Merged
Conversation
In cockroachdb#68074 (which is in 21.2), we added logic to bump the version stored in the system.settings table to intermediate versions as we run migrations. This was critical to provide any sort of invariant when upgrading secondary tenants. The logic to do this bumping works through a callback plumbed into the migrationmanager from the sql pacakge. Unfortunately, this callback did not ensure that the version being written was greater than the exisiting version; it just checked that it was different. This was previously made safe by some transactional properties of the version upgrade. Fixing the check to ensure that the version does indeed go up solves the flake decisively. The question which remains is: why did the flake start January 8th? It seems that it flaked earlier, on December 4th, with cockroachdb#73468 which we never solved. I hypothesize that it becomes more likely the more versions we put into play. Right after we cut the release branch for 22.1, the flake was less common. I think that explains why it got worse over time. The release note is also not great because I don't quite know the repercussions. Fixes cockroachdb#74599. Release note (bug fix): Fixed a bug whereby the cluster version could regress due to a race condition.
|
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
Member
knz
approved these changes
Apr 28, 2022
Contributor
Author
|
TFTR! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport 1/1 commits from #78705.
/cc @cockroachdb/release
In #68074 (which is in 21.2), we added logic to bump the version stored in the
system.settings table to intermediate versions as we run migrations. This was
critical to provide any sort of invariant when upgrading secondary tenants. The
logic to do this bumping works through a callback plumbed into the
migrationmanager from the sql pacakge. Unfortunately, this callback did not
ensure that the version being written was greater than the exisiting version;
it just checked that it was different. This was previously made safe by some
transactional properties of the version upgrade.
Fixing the check to ensure that the version does indeed go up solves the flake
decisively. The question which remains is: why did the flake start January 8th?
It seems that it flaked earlier, on December 4th, with #73468 which we never
solved. I hypothesize that it becomes more likely the more versions we put into
play. Right after we cut the release branch for 22.1, the flake was less common.
I think that explains why it got worse over time.
The release note is also not great because I don't quite know the
repercussions.
Fixes #74599.
Release note (bug fix): Fixed a bug whereby the cluster version could regress
due to a race condition.
Release justification: Small fix which corrects a potentially serious bug.