-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jobs: TestJobInfoUpgradeRegressionTests failed #106347
Labels
A-disaster-recovery
A-jobs
branch-master
Failures and bugs on the master branch.
C-test-failure
Broken test (automatically or manually discovered).
O-robot
Originated from a bot.
release-blocker
Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
T-jobs
Milestone
Comments
craig bot
pushed a commit
that referenced
this issue
Jul 7, 2023
106236: admission: avoid recursive grant chain r=irfansharif a=irfansharif Fixes #105185. Fixes #105613. In #97599 we introduced a non-blocking admission interface for below-raft, replication admission control. When doing so, we unintentionally violated the 'requester' interface -- when 'requester.granted()' is invoked, the granter expects to admit a single queued request. The code layering made it so that after granting one, when doing post-hoc token adjustments, if we observed the granter was exhausted but now no longer was so, we'd try to grant again. This resulted in admitting work recursively, with a callstack as deep as the admit chain. Not only is that undesirable design-wise, it also triggered panics in the granter that wasn't expecting more than one request being admitted. Recursively we were incrementing the grant chain index, which overflowed (it was in int8, so happened readily with long enough admit chains), after which we panic-ed when using a negative index to access an array. We add a test that fails without the changes. The failure can also be triggered by applying the diff below (which reverts to the older, buggy behavior): ``` dev test pkg/kv/kvserver -f TestFlowControlGranterAdmitOneByOne -v --show-logs ``` ```diff diff --git i/pkg/util/admission/granter.go w/pkg/util/admission/granter.go index ba42213c375..7c526fbb3d8 100644 --- i/pkg/util/admission/granter.go +++ w/pkg/util/admission/granter.go `@@` -374,7 +374,7 `@@` func (cg *kvStoreTokenChildGranter) storeWriteDone( func (cg *kvStoreTokenChildGranter) storeReplicatedWorkAdmittedLocked( originalTokens int64, admittedInfo storeReplicatedWorkAdmittedInfo, ) (additionalTokens int64) { - return cg.parent.storeReplicatedWorkAdmittedLocked(cg.workClass, originalTokens, admittedInfo, false /* canGrantAnother */) + return cg.parent.storeReplicatedWorkAdmittedLocked(cg.workClass, originalTokens, admittedInfo, true /* canGrantAnother */) } ``` Release note: None 106378: upgrades: fix txn retry bug in upgrade batching r=adityamaru a=stevendanna In #105750 we split the backfill of the job_type column across multiple transactions. This introduced a bug in which we would modify the resumeAfter variable that controlled the batching before the transaction succeeded. In the face of a transaction retry, this would result in some rows not having their job_type column populated. This was caught in nightly tests that attempted to use the crdb_internal.system_jobs virtual index on the job_type column. Here, we apply the same fix that we applied in #104752 for the same type of bug. Fixes #106347 Fixes #106246 Release note (bug fix): Fixes a bug where a transaction retry during the backfill of the job_type column in the jobs table could result some job records with no job_type value. 106408: ci: remove `lint` job from GitHub CI r=rail a=rickystewart With `staticcheck` and `unused` working identically under `lint` in Bazel and `make` now, it's time! Delete this file so that GitHub CI lint stops running. This is the *last* GitHub CI job. :) Now only Bazel builds and tests will run on PR's. Epic: CRDB-15060 Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
blathers-crl bot
pushed a commit
that referenced
this issue
Jul 7, 2023
In #105750 we split the backfill of the job_type column across multiple transactions. This introduced a bug in which we would modify the resumeAfter variable that controlled the batching before the transaction succeeded. In the face of a transaction retry, this would result in some rows not having their job_type column populated. This was caught in nightly tests that attempted to use the crdb_internal.system_jobs virtual index on the job_type column. Here, we apply the same fix that we applied in #104752 for the same type of bug. Fixes #106347 Fixes #106246 Release note (bug fix): Fixes a bug where a transaction retry during the backfill of the job_type column in the jobs table could result some job records with no job_type value.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-disaster-recovery
A-jobs
branch-master
Failures and bugs on the master branch.
C-test-failure
Broken test (automatically or manually discovered).
O-robot
Originated from a bot.
release-blocker
Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
T-jobs
jobs.TestJobInfoUpgradeRegressionTests failed with artifacts on master @ 818aec861357579eb3a3e987cf5887f3cf112be4:
Help
See also: How To Investigate a Go Test Failure (internal)
Same failure on other branches
This test on roachdash | Improve this report!
Jira issue: CRDB-29520
The text was updated successfully, but these errors were encountered: