-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postpone reenqueuing the iteration job until after callbacks are done #345
Conversation
44020eb
to
a2500ac
Compare
unless private_method_defined?(:reenqueue_iteration_job) | ||
raise 'JobIteration::Iteration#reenqueue_iteration_job must be defined' | ||
end | ||
def reenqueue_iteration_job(should_ignore: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we're not ignoring it, more like postponing it but ok 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about using "postpone" instead, but then figured it's not really postponed because we call it again explicitly, it is more just outright ignored the first time 😛
@@ -9,6 +9,10 @@ | |||
require 'pagy' | |||
require 'pagy/extras/bulma' | |||
|
|||
# Force the TaskJob class to load so we can verify upstream compatibility with | |||
# the JobIteration gem | |||
require_relative '../app/jobs/maintenance_tasks/task_job' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of requiring the file we could to the check here instead? But that separates it from the patch itself so it's probably worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it might be less confusing to keep them altogether for now
This PR fixes a bug that occurs when a job gets interrupted and is reenqueued faster than the original job takes to shut down. This results in race conditions with the @run and results in errors due to invalid status transitions (and notably results in a run showing up as interrupted when it is actually running). To solve this, we should delay reenqueuing the job until after all callbacks have completed, which can be done by calling #reenqueue_iteration_job in a prepended after_perform callback in TaskJob, and skipping the call that JobIteration performs in #iterate_with_enumerator.
…ueue_iteration_job is defined upstream
62397b3
to
b91efac
Compare
For: #335
This PR fixes a bug that occurs when a job gets interrupted and is reenqueued faster than the original job takes to shut down. This results in race conditions with the
@run
and results in errors due to invalid status transitions (and notably results in a run showing up as interrupted when it is actually running).To solve this, we should delay reenqueuing the job until after all callbacks have completed, which can be done by calling
#reenqueue_iteration_job
in a prepended after_perform callback in TaskJob, and skipping the call that JobIteration performs in#iterate_with_enumerator
.Considerations
Arguably callback ordering and ensuring that a job shuts down and runs its callbacks successfully before the new one starts up is something that should be upstreamed to
JobIteration
. I experimented with a PR for this, but the issue is that a number of jobs relying on JobIteration in Core are also dependent on the point at which the job is reenqueued in order to resume batch processing jobs where an error has occurred. (These jobs are currently able to resume from the last cursor when something goes wrong, but changing the order results in the new job not being pushed back to the queue, and things simply failing).I think this is worth reinvestigating upstream at a later point, but I think we should get this patch out in the meantime, given it's affecting a number of users at the moment.
Tophatting
Try without the changes and then with to see the difference.
Steps to reproduce:
bundle add sidekiq
And then run
Maintenance::UpdatePostsTask
from the UI