-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jobs: remove WithTxn #60554
jobs: remove WithTxn #60554
Conversation
98b2ab9
to
ea4f9a0
Compare
I attempted to preserve the existing behavior in this PR, even in places where we seem to have intended to update the job inside a transaction closure using the transaction but neglected to use @pbardea you were the Bulk IO person automatically assigned, and I don't think you have to review this, but maybe you happen to know something about the job updates in the backfiller and GC job. |
ea4f9a0
to
ccec4c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A truly wonderful change!
Reviewed 27 of 27 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang and @pbardea)
pkg/jobs/jobsprotectedts/jobs_protected_ts.go, line 48 at r1 (raw file):
return false, err } // TODO: This job update should possibly use the txn.
This should definitely use the txn. LoadJobWithTxn
used to set the job.txn field to that txn (it did a WithTxn
using the txn).
pkg/sql/backfill.go, line 1273 at r1 (raw file):
fractionCompleted := origFractionCompleted + fractionLeft*fractionRangesFinished // TODO: This job update should possibly use the txn. if err := sc.job.FractionProgressed(ctx, nil /* txn */, jobs.FractionUpdater(fractionCompleted)); err != nil {
yes please, this one is just bad.
pkg/sql/create_stats.go, line 535 at r1 (raw file):
// it can be cleaned up at a higher level. // TODO: This job update should possibly use the txn. if jobErr := r.job.FractionProgressed(
yes please.
pkg/sql/gcjob/gc_job_utils.go, line 120 at r1 (raw file):
return err } // TODO: This job update should possibly use the txn.
the load meant it used to be
pkg/sql/gcjob/gc_job_utils.go, line 264 at r1 (raw file):
return err } // TODO: This job update should possibly use the txn.
yes please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 As Andrew pointed out, it looks like the GC job was already using the txn with the loading phase, and the backfiller just missed it - much clearer where we're missing these after this change. Thanks!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would the two of you feel about merging this PR which keeps the existing behavior, even where it's incorrect, and making changes in subsequent PRs (possibly during the stability period)? I can open an issue to track this.
I couldn't figure out the test timeouts I was seeing when I did make the changes; nothing seemed obviously hung from the goroutine dump and I wasn't getting the timeouts consistently, so it's possible that combining some things into a single transaction caused more contention and restarts, or something, though that seems kind of implausible. So I think this all deserves more investigation. I can isolate the changes one by one and run the tests on my gceworker (or just in CI), but that will take some time.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @lucy-zhang)
pkg/jobs/jobsprotectedts/jobs_protected_ts.go, line 48 at r1 (raw file):
LoadJobWithTxn
used to set the job.txn field to that txn (it did aWithTxn
using the txn).
By "used to," do you mean before this PR? I thought the txn specified in LoadJobWithTxn
only applies to the method itself, because it internally uses runInTxn
, which has a defer func() { j.txn = nil }()
. If that's not the case then I will happily make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for uncovering the badness that we didn't even realize we were dealing with. Merge away
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @lucy-zhang)
pkg/jobs/jobsprotectedts/jobs_protected_ts.go, line 48 at r1 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
LoadJobWithTxn
used to set the job.txn field to that txn (it did aWithTxn
using the txn).By "used to," do you mean before this PR? I thought the txn specified in
LoadJobWithTxn
only applies to the method itself, because it internally usesrunInTxn
, which has adefer func() { j.txn = nil }()
. If that's not the case then I will happily make this change.
😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @lucy-zhang)
This commit removes the `Job`'s `WithTxn` method and its `txn` field, which were used to set the next transaction used by a job, and have been a source of many bugs due to setting an incorrect or nil transaction. Now there is an explicit `*kv.Txn` argument on all the `Job` methods that read from the store, where a nil argument still indicates that a new txn should be created. Release note: None
ccec4c8
to
b8ca061
Compare
TFTR. I opened an issue (#60690) and the TODOs refer to it now. bors r+ |
This PR was included in a batch that was canceled, it will be automatically retried |
Build succeeded: |
This commit removes the
Job
'sWithTxn
method and itstxn
field,which were used to set the next transaction used by a job, and have been
a source of many bugs due to setting an incorrect or nil transaction.
Now there is an explicit
*kv.Txn
argument on all theJob
methodsthat read from the store, where a nil argument still indicates that a
new txn should be created.
Closes #57534.
Release note: None