sql,migration: ensure cluster version never regresses by ajwerner · Pull Request #78705 · cockroachdb/cockroach

ajwerner · 2022-03-29T05:14:43Z

In #68074 (which is in 21.2), we added logic to bump the version stored in the
system.settings table to intermediate versions as we run migrations. This was
critical to provide any sort of invariant when upgrading secondary tenants. The
logic to do this bumping works through a callback plumbed into the
migrationmanager from the sql pacakge. Unfortunately, this callback did not
ensure that the version being written was greater than the exisiting version;
it just checked that it was different. This was previously made safe by some
transactional properties of the version upgrade.

Fixing the check to ensure that the version does indeed go up solves the flake
decisively. The question which remains is: why did the flake start January 8th?
It seems that it flaked earlier, on December 4th, with #73468 which we never
solved. I hypothesize that it becomes more likely the more versions we put into
play. Right after we cut the release branch for 22.1, the flake was less common.
I think that explains why it got worse over time.

The release note is also not great because I don't quite know the
repercussions.

Fixes #74599.

Release note (bug fix): Fixed a bug whereby the cluster version could regress
due to a race condition.

cockroach-teamcity · 2022-03-29T05:14:51Z

This change is

ajwerner · 2022-03-29T13:02:56Z

In somewhat better news, it turns out that this is less severe than I first thought. On dedicated clusters, the value in the system.settings table isn't really ever used. We instead use the value written to disk on each store, at least, so far as I can tell. For multi-tenant clusters, we don't allow more than one sql pod to be active during upgrades.

In cockroachdb#68074 (which is in 21.2), we added logic to bump the version stored in the system.settings table to intermediate versions as we run migrations. This was critical to provide any sort of invariant when upgrading secondary tenants. The logic to do this bumping works through a callback plumbed into the migrationmanager from the sql pacakge. Unfortunately, this callback did not ensure that the version being written was greater than the exisiting version; it just checked that it was different. This was previously made safe by some transactional properties of the version upgrade. Fixing the check to ensure that the version does indeed go up solves the flake decisively. The question which remains is: why did the flake start January 8th? It seems that it flaked earlier, on December 4th, with cockroachdb#73468 which we never solved. I hypothesize that it becomes more likely the more versions we put into play. Right after we cut the release branch for 22.1, the flake was less common. I think that explains why it got worse over time. The release note is also not great because I don't quite know the repercussions. Fixes cockroachdb#74599. Release note (bug fix): Fixed a bug whereby the cluster version could regress due to a race condition.

ajwerner · 2022-03-31T05:25:53Z

TFTR!

bors r+

craig · 2022-03-31T06:59:55Z

Build succeeded:

GitHub CI (Cockroach)

blathers-crl · 2022-03-31T07:00:08Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

error creating merge commit from 7d3415b to blathers/backport-release-21.2-78705: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.2.x failed. See errors above.

error creating merge commit from 7d3415b to blathers/backport-release-22.1-78705: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

adityamaru · 2022-04-28T12:59:58Z

@ajwerner doesn't look like the backports made it to the branches? I seem to have hit the related test failure in a recent backport to 22.1.0 - #80224

ajwerner · 2022-04-28T13:09:15Z

ack, will backport

ajwerner requested a review from a team as a code owner March 29, 2022 05:14

ajwerner added backport-21.2.x labels Mar 29, 2022

stevendanna approved these changes Mar 29, 2022

View reviewed changes

ajwerner force-pushed the ajwerner/fix-version-upgrade branch from 7a76ef4 to 6432398 Compare March 29, 2022 17:42

ajwerner force-pushed the ajwerner/fix-version-upgrade branch from 6432398 to 7d3415b Compare March 31, 2022 00:33

craig bot merged commit 4426742 into cockroachdb:master Mar 31, 2022

This was referenced Apr 28, 2022

release-22.1: sql,migration: ensure cluster version never regresses #80711

Merged

release-22.1.0: sql,migration: ensure cluster version never regresses #80712

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

sql,migration: ensure cluster version never regresses#78705

sql,migration: ensure cluster version never regresses#78705
craig[bot] merged 1 commit intocockroachdb:masterfrom
ajwerner:ajwerner/fix-version-upgrade

ajwerner commented Mar 29, 2022

Uh oh!

cockroach-teamcity commented Mar 29, 2022

Uh oh!

ajwerner commented Mar 29, 2022

Uh oh!

ajwerner commented Mar 31, 2022

Uh oh!

craig bot commented Mar 31, 2022

Uh oh!

blathers-crl bot commented Mar 31, 2022

Uh oh!

adityamaru commented Apr 28, 2022 •

edited

Loading

Uh oh!

ajwerner commented Apr 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

ajwerner commented Mar 29, 2022

Uh oh!

cockroach-teamcity commented Mar 29, 2022

Uh oh!

ajwerner commented Mar 29, 2022

Uh oh!

ajwerner commented Mar 31, 2022

Uh oh!

craig bot commented Mar 31, 2022

Uh oh!

blathers-crl bot commented Mar 31, 2022

Uh oh!

adityamaru commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajwerner commented Apr 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

adityamaru commented Apr 28, 2022 •

edited

Loading