Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server,upgrades: bootstrap at binaryMinVersion and run upgrades to BinaryVersionOverride #99082

Merged

Conversation

adityamaru
Copy link
Contributor

@adityamaru adityamaru commented Mar 20, 2023

Previously, tests that set BinaryVersionOverride would bootstrap
their test clusters at this overridden value. As part of this bootstrap
we would initialize the cluster with data as it appears on binaryVersion
i.e. system tables would be created using the schemas defined on the current binaryVersion.
Most tests that set BinaryVersionOverride would then manually call into upgrade.Upgrade
to move the cluster version forward. Since we have already bootstrapped the initial
data at binaryVersion these upgrades would not really run. We would just be testing
their idempotency unless the test did some gymnastics to "undo" the bootstrap and
then test the actual migration. So effectively, the BinaryVersionOverride made
the server think it was running that version but none of the associated upgrade
logic to reach that version was exercised.

In 23.1 we baked in the initial data that is bootstrapped on binaryMinSupportedVersion.
This development allows us to bootstrap the test cluster to binaryMinSupportedVersion
and actually step through the upgrades until BinaryVersionOverride before running the
test.

The new behaviour of setting this override is described below:

This value, when set, influences test cluster/server creation in two
different ways:

 Case 1:
 ------
 If the test has not overridden the
 `cluster.Settings.Version.BinaryMinSupportedVersion`, then the cluster will
 be bootstrapped at `binaryMinSupportedVersion`  (if this server is the one
 bootstrapping the cluster). After all the servers in the test cluster have
 been started, `SET CLUSTER SETTING version = BinaryVersionOverride` will be
 run to step through the upgrades until the specified override.

 Case 2:
 ------
 If the test has overridden the
 `cluster.Settings.Version.BinaryMinSupportedVersion` then it is not safe
 for us to bootstrap at `binaryMinSupportedVersion` as it might be less than
 the overridden minimum supported version. Furthermore, we do not have the
 initial cluster data (system tables etc.) to bootstrap at the overridden
 minimum supported version. In this case we bootstrap at
 `BinaryVersionOverride` and populate the cluster with initial data
 corresponding to the `binaryVersion`. In other words no upgrades are
 *really* run and the server only thinks that it is running at
 `BinaryVersionOverride`. Tests that fall in this category should be audited
 for correctness.
The version that we bootstrap at is also used when advertising this
server's binary version when sending out join requests.

Informs: #97762

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru adityamaru force-pushed the binary-version-bootstrap-version branch 3 times, most recently from 97483fb to e7bd943 Compare March 22, 2023 22:01
@adityamaru adityamaru changed the title [DNM, WIP]: bootstrap test cluster at MinBinaryVersion and step through migrations server,upgrades: bootstrap at binaryMinVersion and run upgrades to BinaryVersionOverride Mar 22, 2023
@adityamaru adityamaru marked this pull request as ready for review March 22, 2023 22:02
@adityamaru adityamaru requested a review from a team March 22, 2023 22:02
@adityamaru adityamaru requested review from a team as code owners March 22, 2023 22:02
@adityamaru adityamaru requested review from srosenberg, smg260, irfansharif, dt and a team and removed request for a team March 22, 2023 22:02
@adityamaru adityamaru force-pushed the binary-version-bootstrap-version branch from e7bd943 to c4db729 Compare March 23, 2023 13:33
@adityamaru
Copy link
Contributor Author

For context, I need some version of this patch to be able to drop columns from the system.jobs table - #98989. Prior to this patch, since all tests that override their binary version bootstrap with the system tables from the binary version we don't run the necessary upgrades that change the schema of the jobs table.

@@ -125,7 +125,7 @@ CREATE SEQUENCE system.role_id_seq START 100 MINVALUE 100 MAXVALUE 2147483647;`
serviceLatencyComputeExpr = `(((statistics->'statistics':::STRING)->'svcLat':::STRING)->'mean':::STRING)::FLOAT8`
cpuSqlNanosComputeExpr = `(((statistics->'execution_statistics':::STRING)->'cpuSQLNanos':::STRING)->'mean':::STRING)::FLOAT8`
contentionTimeComputeExpr = `(((statistics->'execution_statistics':::STRING)->'contentionTime':::STRING)->'mean':::STRING)::FLOAT8`
totalEstimatedExecutionTimeExpr = `((statistics->'statistics':::STRING)->'cnt':::STRING)::FLOAT8 * (((statistics->'statistics':::STRING)->'svcLat':::STRING)->>'mean':::STRING)::FLOAT8`
totalEstimatedExecutionTimeExpr = `((statistics->'statistics':::STRING)->>'cnt':::STRING)::FLOAT8 * (((statistics->'statistics':::STRING)->'svcLat':::STRING)->>'mean':::STRING)::FLOAT8`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @ericharmeling this is a small diff that was caught now that we actually run upgrades during tests that override the binary version. This expression deferred slightly from the expression used in the migration that adds this column.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this. I should've caught this when adding the migration ( 🤦 ), but it makes sense that the upgrade tests didn't at the time.

Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent an hour trying to page this back in, including talking to Aditya offline. We wrote up https://cockroachlabs.slack.com/files/UHXRNP9D1/F0503KK3VFE/unit_testing_gap_with_upgrades for @cockroachdb/test-eng consideration. I'm going to recuse myself from the code reviews (it looks ok to me, but the audit of case2 tests seems worth doing).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @smg260, and @srosenberg)

// ------
// If the test has not overridden the
// `cluster.Settings.Version.BinaryMinSupportedVersion`, then the cluster will
// be bootstrapped at `binaryMinSupportedVersion` (if this server is the one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd insist the test specify the bootstrap version if it specifies a binary version override, rather than just picking binaryMin. I get that this would mean changing every test right now but maybe leave a TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, added a TODO

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall this sounds good to me -- tests that think they're testing against an older version should look more like that older version, including its bootstrapped state, not new state. I have a couple nits in how we decide what state to bootstrap (I'd demand a test pick instead of assuming min -- which can change -- is what they want), but those don't need to block this since as-is it is already much more correct than the previous state of mixed-version tests being mostly ineffective due to flawed setup

@adityamaru adityamaru force-pushed the binary-version-bootstrap-version branch from c4db729 to 5d99c57 Compare March 23, 2023 21:47
@blathers-crl blathers-crl bot requested a review from irfansharif March 23, 2023 21:47
@adityamaru adityamaru added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Mar 24, 2023
Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, thanks for fixing this!

If I understand it, this behavior is not new and has existed for multiple releases. Have we seen differences in the schema of a cluster bootstrapped at a newer version vs one that was migrated from an older version (like what was fixed in this PR)?

If not, were we just "lucky" before, or are there other tests that would've caught such differences?

pkg/server/testing_knobs.go Show resolved Hide resolved
…naryVersionOverride

Previously, tests that set BinaryVersionOverride would bootstrap
their test clusters at this overridden value. As part of this bootstrap
we would initialize the cluster with data as it appears on `binaryVersion`
i.e. system tables would be created using the schemas defined on the current `binaryVersion`.
Most tests that set BinaryVersionOverride would then manually call into `upgrade.Upgrade`
to move the cluster version forward. Since we have already bootstrapped the initial
data at `binaryVersion` these upgrades would not *really* run. We would just be testing
their idempotency unless the test did some gymnastics to "undo" the bootstrap and
then test the actual migration. So effectively, the `BinaryVersionOverride` made
the server think it was running that version but none of the associated upgrade
logic to reach that version was exercised.

In 23.1 we baked in the initial data that is bootstrapped on `binaryMinSupportedVersion`.
This development allows us to bootstrap the test cluster to `binaryMinSupportedVersion`
and *actually* step through the upgrades until `BinaryVersionOverride` before running the
test.

The new behaviour of setting this override is described below:

This value, when set, influences test cluster/server creation in two
different ways:

	 Case 1:
	 ------
	 If the test has not overridden the
	 `cluster.Settings.Version.BinaryMinSupportedVersion`, then the cluster will
	 be bootstrapped at `binaryMinSupportedVersion`  (if this server is the one
	 bootstrapping the cluster). After all the servers in the test cluster have
	 been started, `SET CLUSTER SETTING version = BinaryVersionOverride` will be
	 run to step through the upgrades until the specified override.

	 Case 2:
	 ------
	 If the test has overridden the
	 `cluster.Settings.Version.BinaryMinSupportedVersion` then it is not safe
	 for us to bootstrap at `binaryMinSupportedVersion` as it might be less than
	 the overridden minimum supported version. Furthermore, we do not have the
	 initial cluster data (system tables etc.) to bootstrap at the overridden
	 minimum supported version. In this case we bootstrap at
	 `BinaryVersionOverride` and populate the cluster with initial data
	 corresponding to the `binaryVersion`. In other words no upgrades are
	 *really* run and the server only thinks that it is running at
	 `BinaryVersionOverride`. Tests that fall in this category should be audited
	 for correctness.
	The version that we bootstrap at is also used when advertising this
	server's binary version when sending out join requests.

Informs: cockroachdb#97762

Release note: None
@adityamaru adityamaru force-pushed the binary-version-bootstrap-version branch from 5d99c57 to 1d1ac7a Compare March 25, 2023 20:48
@adityamaru
Copy link
Contributor Author

If I understand it, this behavior is not new and has existed for multiple releases

yup that is correct

If not, were we just "lucky" before, or are there other tests that would've caught such differences?

as you pointed out online I think we were lucky before. Going forward however TestInitialValuesToString and this PR should mean we see any mismatches in upgrades vs. baked-in system.go

@adityamaru
Copy link
Contributor Author

TFTR!

bors r=dt,renatolabs

@craig
Copy link
Contributor

craig bot commented Mar 26, 2023

Build succeeded:

@blathers-crl
Copy link

blathers-crl bot commented Mar 26, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


Backport to branch 23.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants