-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release: audit unit/integration upgrade tests #100552
Comments
@cockroachdb/multi-tenant, @cockroachdb/sql-schema, @cockroachdb/storage, @cockroachdb/kv-prs, @cockroachdb/server-prs,@cockroachdb/disaster-recovery |
But so does So I was curious how the list was generated? |
Remove an unnecessary use of cluster.MakeTestingClusterSettingsWithVersions in favor of cluster.MakeTestingClusterSettings. Epic: None Informs: cockroachdb#100552 Release note: None
100539: sql: add telemetry for UDFs with RETURNS TABLE r=mgartner a=mgartner Informs #100226 Release note: None 100554: kv: initialize consistencyLimiter during Store construction, before Start r=aliher1911 a=nvanbenschoten Fixes #96231. This commit attempts to fix #96231. It moves the initialization of `Store.consistencyLimiter` up from the bottom of `Store.Start` into `NewStore`. It is unsafe to initialize this variable after the call to `Store.processRaft`, which starts Raft processing. Beyond that point, incoming Raft traffic is permitted and calls to `computeChecksumPostApply` are possible. The two stacktraces we have in that issue conclusively point to the `Store.consistencyLimiter` being nil during a call to `(*Replica).computeChecksumPostApply`. This startup-time race is the only possible explanation I could come up with. Release note (bug fix): Fixed a rare startup race that could cause an `invalid memory address or nil pointer dereference` error. 100592: storage: remove unnecessary version override in unit test r=nicktrav a=jbowens Remove an unnecessary use of cluster.MakeTestingClusterSettingsWithVersions in favor of cluster.MakeTestingClusterSettings. Epic: None Informs: #100552 Release note: None Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
It used to before this change [1]. When I ran this last night, my local [1] 16ff79d#diff-581dd19761966c928337b36%5B%E2%80%A6%5D174592c84a4c44a699816111b8c5 |
What I noticed when checking tests is that docs on |
This test has been rewritten to explicitly indicate what version the test cluster is being bootstrapped and upgraded to. Informs: cockroachdb#100552 Release note: None
The userfile descriptor corruption version gate can be safely deleted as all clusters upgrading to 23.1+ are guaranteed to have been upgraded to 22.2 prior to that. Informs: cockroachdb#100552 Release note: None
This version gate was for clusters that were not fully upgraded to 22.2. Clusters that upgrade to 23.1+ are guaranteed to have run this migration and so it is safe to delete. Informs: cockroachdb#100552 Release note: None
This change bumps the mixed version restore test to use the current minimum binary version instead of an older V22_2Start gate that will soon be deleted. Informs: cockroachdb#100552 Release note: None
As described in cockroachdb#100552, it's important for this API to use TestingBinaryMinSupportedVersion in order to correctly bootstrap on the older version. Removed TestFixUserfileRelatedDescriptorCorruptionUpgrade and TestPreconditionBeforeStartingAnUpgrade since they are for 22.2 migrations. Release note: None
@srosenberg @adityamaru @rafiss I spent a bit of time trawling though this doc explaining the problem and the lengthy comment on
In other words, a unit test writer needs to specify both the binaryVersionOverride and the BootstrapVersionKeyOverride. I'm inferring this from @adityamaru 's TODO: cockroach/pkg/server/testing_knobs.go Lines 79 to 81 in 0a0985e
Could somebody clarify and then update the issue header? |
This change bumps the mixed version restore test to use the current minimum binary version instead of an older V22_2Start gate that will soon be deleted. Informs: cockroachdb#100552 Release note: None
This change bumps the mixed version restore test to use the current minimum binary version instead of an older V22_2Start gate that will soon be deleted. Informs: cockroachdb#100552 Release note: None
99958: jobs,server: graceful shutdown for secondary tenant servers r=stevendanna a=knz Epic: CRDB-23559 Fixes #92523. All commits but the last are from #100436. This change ensures that tenant servers managed by the server controller receive a graceful drain request as part of the graceful drain process of the surrounding KV node. This change, in turn, ensures that SQL clients connected to these secondary tenant servers benefit from the same guarantees (and graceful periods) as clients to the system tenant. 100726: upgrades: use TestingBinaryMinSupportedVersion in tests r=rafiss a=rafiss As described in #100552, it's important for this API to use TestingBinaryMinSupportedVersion in order to correctly bootstrap on the older version. informs #100552 Release note: None 100741: contextutil: teach TimeoutError to redact only the operation name r=andreimatei a=andreimatei Before this patch, the whole message of TimeoutError was redacted in logs. Now, only the operation name is. Release note: None Epic: None 100778: norm: update prune cols to match PruneJoinLeftCols/PruneJoinRightCols r=msirek a=msirek In #90599 adjustments where made to the PruneJoinLeftCols and PruneJoinRightCols normalization rules to avoid pruning columns which might be needed when deriving new predicates based on foreign key constraints for lookup join. However, this caused a problem where rules might sometimes fire in an infinite loop because the same columns to prune keep getting added as PruneCols in calls to DerivePruneCols. The logic in prune_cols.opt and DerivePruneCols must be kept in sync to avoid such problems, and this PR brings it back in sync. Epic: none Fixes: #100478 Release note: None 100821: cmd/roachtest: adjust disk-stalled roachtests TPS calculation r=itsbilal a=jbowens Previously, the post-stall TPS calculation included the time that the node was stalled but before the stall triggered the node's exit. During this period, overall TPS drops until the gray failure is converted into a hard failure. This commit adjusts the post-stall TPS calculation to exclude the stalled time when TPS is expected to tank. Epic: None Informs: #97705. Release note: None Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Mark Sirek <sirek@cockroachlabs.com> Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
99288: cdc: add apache arrow parquet library and writer r=miretskiy a=jayshrivastava #### cdc: add apache arrow parquet library This commit installs the apache arrow parquet library for Go at version 11. The release can be found here: https://github.com/apache/arrow/releases/tag/go%2Fv11.0.0 This library is licensed under the Apache License 2.0. Informs: #99028 Epic: None Release note: None --- #### util/parquet: create parquet writer library This change implements a `Writer` struct in the new `util/parquet` package. This `Writer` writes datums to the `io.Writer` sink using a configurable parquet version (defaults to v2.6). The package implements several features internally required to write in the parquet format: - schema creation - row group / column page management - encoding/decoding of CRDB datums to parquet datums Currently, the writer only supports types found in the TPCC workload, namely INT, DECIMAL, STRING UUID, TIMESTAMP and BOOL. This change also adds a benchmark and tests which verify the correctness of the writer and test utils for reading datums from parquet files. Informs: #99028 Epic: None Release note: None --- #### changefeedccl: add parquet writer This change adds the file `parquet.go` which contains helper functions to help create parquet writers and export data via `cdcevent.Row` structs. This change also adds tests to ensure rows are written to parquet files correctly. Epic: None Release note: None 100830: upgrademanager: fix upgrade manager tests that relied on wrong invariants r=knz a=adityamaru The two tests in question were low level tests that upgrade clusters from mock version 41 to 42 and override the upgrade flow to test the internals of the upgrade manager. These mock cluster versions were not tied to our real cluster versions and so were not offset with the `DevOffset` when a branch is marked as a development branch. For this reason when a branch was a developmentBranch they would sort under all the "real" cluster versions while when a branch was marked as a release branch they would sort above all the "real" cluster versions. This undeterministic behaviour did not play well with certain job migrations that were added this release. This change rewrites the test to use real cluster versions so that their values are in sync with whether the branch is a developmentBranch or not. This change also removes some overrides to make the test more intuitive. Cocnretely, the test servers now bootstrap at the minimum supported binary version and run all the upgrades until the `startCV` before executing the body of the test. Fixes: #100685 Informs: #100552 Release note: None Co-authored-by: Jayant Shrivastava <jayants@cockroachlabs.com> Co-authored-by: adityamaru <adityamaru@gmail.com>
…ants The two tests in question were low level tests that upgrade clusters from mock version 41 to 42 and override the upgrade flow to test the internals of the upgrade manager. These mock cluster versions were not tied to our real cluster versions and so were not offset with the `DevOffset` when a branch is marked as a development branch. For this reason when a branch was a developmentBranch they would sort under all the "real" cluster versions while when a branch was marked as a release branch they would sort above all the "real" cluster versions. This undeterministic behaviour did not play well with certain job migrations that were added this release. This change rewrites the test to use real cluster versions so that their values are in sync with whether the branch is a developmentBranch or not. This change also removes some overrides to make the test more intuitive. Cocnretely, the test servers now bootstrap at the minimum supported binary version and run all the upgrades until the `startCV` before executing the body of the test. Fixes: #100685 Informs: #100552 Release note: None
As described in cockroachdb#100552, it's important for this API to use TestingBinaryMinSupportedVersion in order to correctly bootstrap on the older version. Removed TestFixUserfileRelatedDescriptorCorruptionUpgrade and TestPreconditionBeforeStartingAnUpgrade since they are for 22.2 migrations. Release note: None
Trying to find out what is left to be done here. @cockroachdb/multi-tenant Have you had the chance to take a look at the tests under @cockroachdb/storage have you had a chance to look at |
The Storage test was updated last week. I've updated the summary with a link to the patch. |
I checked TestTenantUpgradeInterlock. For the others we're waiting for @healthy-pod and @ajstorm to be back next week. |
Everything has been addressed, closing the issue. Thanks everyone for your help! |
All unit tests which invoke
MakeTestingClusterSettingsWithVersions
and pass a non-defaultbinaryMinSupportedVersion
(i.e., anything other thanclusterversion.TestingBinaryMinSupportedVersion
) should be audited. The reason is explained in detail in [1].TL;DR Before [1], the test cluster would be bootstrapped at the current version unless the test passed
BootstrapVersionKeyOverride
. Thus, migrations were not being tested other than their idempotency. After [1], the test cluster is correctly bootstrapped at the defaultbinaryMinSupportedVersion
unless the test overrides it. This roughly translates to,NOTE: the above only concerns the unit tests. Roachtests use a different bootstrapping mechanism which ensures an upgrade test is always bootstrapped at some previous version.
binaryMinSupportedVersion
s.BinaryVersionOverride()
toSET CLUSTER SETTING version
which triggers migrations from the bootstrapped version to the (overridden)BinaryVersion
Below are all unit tests, grouped by package, which should be audited. This set is conservative. The ones which say
FAIL
pass a non-defaultbinaryMinSupportedVersion
, but may still be correct. The ones which don't sayFAIL
are likely to be correct, but it wouldn't hurt to double-check them.pkg/ccl/kvccl/kvtenantccl
@cockroachdb/multi-tenant
TestTenantUpgradeFailureTestTenantUpgradeInterlockpkg/ccl/serverccl
@cockroachdb/multi-tenant
TestServerStartupGuardrails (FAIL
)TestBumpTenantClusterVersion (FAIL
)TestValidateTargetTenantClusterVersion (FAIL
)pkg/kv/kvclient/kvcoord
TestBiDirectionalRangefeedNotUsedUntilUpgradeFinalilzed (FAIL
)pkg/kv/kvserver
TestMigrateWaitsForApplication (FAIL
)TestLeaseUpgradeVersionGate (FAIL
)TestRangeMigration (FAIL
)TestLoadBasedRebalancingObjectiveTestRebalanceObjectiveManagerTestStoreConfigpkg/kv/kvserver/batcheval
TestDeclareKeysResolveIntentpkg/server/settingswatcher
TestVersionGuardpkg/storage
TestPebbleIterator_ExternalCorruption#100592pkg/upgrade/upgrademanager
@cockroachdb/sql-schema and @cockroachdb/jobs
TestAlreadyRunningJobsAreHandledProperly - #100830
TestConcurrentMigrationAttempts - #100873
TestMigrateUpdatesReplicaVersion - #100873
TestPauseMigration - #100830
pkg/upgrade/upgrades
@cockroachdb/sql-schema
TestDatabaseRoleSettingsUserIDMigration1500Users (seems like a mistake that this is in this listFAIL
)TestDeleteDescriptorsOfDroppedFunctions (
FAIL
) #100726TestExternalConnectionsUserIDMigration10Users (seems like a mistake that this is in this listFAIL
)TestFixUserfileRelatedDescriptorCorruptionUpgrade (
FAIL
) #100726TestPreconditionBeforeStartingAnUpgrade (
FAIL
) #100726TestMigrationWithFailuresMultipleAltersOnSameColumnfixed by #100644TestSystemJobInfoMigration #100726 #100696
TestSystemPrivilegesIndexMigrationseems like a mistake that this is in this listTestSystemActivityMigration #100726
TestWaitForDelRangeInGCJob (
FAIL
) #100726TestWebSessionsUserIDMigrationNoUsersseems like a mistake that this is in this listpkg/ccl/backupccl/datadriven_test.go
@cockroachdb/disaster-recovery
testdata/backup-restore/restore-schema-only-mixed-version:new-cluster name=s1 beforeVersion=Start22_2 disable-tenant #100696
testdata/backup-restore/in-progress-import-rollback:new-cluster name=s1 beforeVersion=23_1_MVCCTombstones disable-tenant #100696
testdata/backup-restore/restore-mixed-version:new-cluster name=s1 beforeVersion=Start22_2 disable-tenant #100696
testdata/backup-restore/restore-mixed-version:new-cluster name=s2 beforeVersion=Start22_2 share-io-dir=s1 disable-tenant #100696
[1] #99082
Jira issue: CRDB-26487
The text was updated successfully, but these errors were encountered: