-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: persist correct cluster version on join #27639
Conversation
Interestingly we'll have to do something about the mixed version tests still, since that test involves nodes that don't have the code. My preference would be to add an extra stop-restart cycle in the test which will update the cluster version in-place even on the old binaries. |
Still playing with the roachtest -- got it failing immediately with a
which is odd because the cluster version is supposedly 2.0. Looking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/stores.go, line 447 at r1 (raw file):
} // If the update downgrades the minimum version, ignore it. Must be // a reordering. Note that we do carry out the upgrade if the MinVersion
Can you elaborate on the reordering that is possible here?
6a670b0
to
272be82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFAL, the new commits take care of the roachtest failures.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/stores.go, line 447 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Can you elaborate on the reordering that is possible here?
Done. (Along with a small refactor).
Prior to this commit, when a node joined an existing cluster, it would likely persist its MinSupportedVersion as the cluster version (even though it would operate at the correct version). This is problematic since this node's data directories wouldn't be regarded compatible with the successor version of the real version, even though they are. The problem was that the callback which receives the real cluster version would fire before the stores were bootstrapped, and bootstrap would erroneously pick the minimum supported version. We now "backfill" the correct cluster version after having bootstrapped all stores. Even though the fixtures will need to be regenerated: Fixes cockroachdb#27525. Optimistically: Fixes cockroachdb#27162. Release note: None
Change the test slightly so that it works around cockroachdb#27162 until we backport cockroachdb#27639 to release-2.0. Release note: None
272be82
to
fa7677f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/cmd/roachtest/upgrade.go, line 55 at r3 (raw file):
// TODO(tschottdorf): this is a hack similar to the one in the mixed version // test. Remove it when we have a 2.0.x binary that has #27639 fixed.
Might want to file an issue so you don't forget about this. Or just don't forget about this.
Reminder filed: #27717 bors r=petermattis |
27639: server: persist correct cluster version on join r=petermattis a=tschottdorf Prior to this commit, when a node joined an existing cluster, it would likely persist its MinSupportedVersion as the cluster version (even though it would operate at the correct version). This is problematic since this node's data directories wouldn't be regarded compatible with the successor version of the real version, even though they are. The problem was that the callback which receives the real cluster version would fire before the stores were bootstrapped, and bootstrap would erroneously pick the minimum supported version. We now "backfill" the correct cluster version after having bootstrapped all stores. Even though the fixtures will need to be regenerated: Fixes #27525. Optimistically: Fixes #27162. Release note: None 27712: rpc: avoid clock offset fatal on unvalidated connection r=a-robinson a=tschottdorf I think we were seeing rare flakes in CI in which a heartbeat service got connections from another test that used a different clock offset due to port reuse. Since the assertion didn't wait for the connection to be validated (i.e. cluster versions matching), this could crash. This is only a theory, but we're never able to reproduce these failures. There are lots of places that use a 1ns maxoffset, including bootstrapCluster. If we continue to see these failures, we should make all of those unique numbers to lend credibility to the port reuse theory. Speculatively: Fixes #27555. Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Build succeeded |
27741: backport-2.0: server: persist correct cluster version on join r=petermattis a=tschottdorf Backport 1/2 commits from #27639. I had to adapt the test, and I have verified that it fails without the main change. /cc @cockroachdb/release --- Prior to this commit, when a node joined an existing cluster, it would likely persist its MinSupportedVersion as the cluster version (even though it would operate at the correct version). This is problematic since this node's data directories wouldn't be regarded compatible with the successor version of the real version, even though they are. The problem was that the callback which receives the real cluster version would fire before the stores were bootstrapped, and bootstrap would erroneously pick the minimum supported version. We now "backfill" the correct cluster version after having bootstrapped all stores. Even though the fixtures will need to be regenerated: Fixes #27525. Optimistically: Fixes #27162. Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Prior to this commit, when a node joined an existing cluster, it would
likely persist its MinSupportedVersion as the cluster version (even
though it would operate at the correct version). This is problematic
since this node's data directories wouldn't be regarded compatible with
the successor version of the real version, even though they are.
The problem was that the callback which receives the real cluster
version would fire before the stores were bootstrapped, and bootstrap
would erroneously pick the minimum supported version. We now "backfill"
the correct cluster version after having bootstrapped all stores.
Even though the fixtures will need to be regenerated:
Fixes #27525.
Optimistically:
Fixes #27162.
Release note: None