-
-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Couldn't find GoodJob::BatchRecord" error #1387
Comments
That's strange! That looks like a Batch callback job tried to run, and it was unable to load the associated Batch record out of the database. Are you still experiencing the error / did that job ever successfully complete? It's possible this was related to your DB outage. Normally though the
|
I'm no longer experiencing the error, so the issue may indeed have been specific to the outage. I'm happy to close this and report back if it happens again. |
@bensheldon I've just had this error happen again: this time without any outage or DB issues. I'm wondering if the cause could be due to a job in the batch retrying for a while before finally succeeding? Perhaps the batch is cleaned up in the meantime, so that when the retrying job eventually succeeds, the batch is gone? |
Hmm. That's definitely possible if the batch callback job doesn't successfully complete before the batch is deleted. The default time to preserve jobs (the same setting is used for batches) is 2 weeks. |
Ah, I think I see the problem now: I've got the following set: config.good_job.cleanup_preserved_jobs_before_seconds_ago = 10.minutes So a batch is considered complete once its constituent jobs finish, but not before any callback jobs finish. That means if a callback takes longer than 10 minutes to run (either due to being queued for a while or retrying) then the batch will have been deleted before the callback job can run, and then when the callback job runs, it errors because it can't find the batch. I propose that batches should not be deleted until their callback jobs have completed, or if that's too complicated, that batches should have their own preservation config. What do you think about that @bensheldon ? |
I lean towards adding additional checks to ensure that callback jobs have completed. I'm trying to remember why that wasn't part of the design for batches, because I'm pretty sure I ignored them intentionally. Though maybe that was simply because I didn't want to run the callbacks in sequence. Or maybe I just was waiting for someone to notice and implement it just-in-time 😄
|
That sounds reasonable to me |
@jesseduffield do you want to attempt it or should I add to me todo list? 😉 |
I'm afraid I won't have time to work on this any time soon so feel free to chuck it in the todo list :) |
Connects to #1387 Add logic to delete batches only after their callback jobs have completed.
Connects to #1387 Add logic to delete batches only after their callback jobs have completed.
Connects to #1387 Add logic to delete batches only after their callback jobs have completed.
Connects to #1387 Add logic to delete batches only after their callback jobs have completed.
Connects to #1387 Add logic to delete batches only after their callback jobs have completed.
|
Nope, I don't want to redefine that because it could effect callback jobs that are asking "am I running in a batch of jobs, or as a callback to a batch of jobs. I want to maintain the semantics that a batch's discarded/succeeded/finished states refer to the batch of jobs , of which the callback jobs are not inclusive. Callbacks happen after the batch, not part of the batch. Fhew, took some thinking about that. |
Connects to #1387 Add logic to delete batches only after their callback jobs have completed.
Connects to #1387 Add logic to delete batches only after their callback jobs have completed.
Nope nope, the problem of not redefining |
Connects to #1387 Add logic to delete batches only after their callback jobs have completed.
Connects to #1387 Add logic to delete batches only after their callback jobs have completed.
I'm getting this error on some of my batch callback jobs:
{"_aj_globalid"=>"gid://subble-api/GoodJob::Batch/e4b54d15-a2a4-4778-bf3c-5ce1de3f366f"},
{"event"=>{"value"=>"finish",
"_aj_serialized"=>"ActiveJob::Serializers::SymbolSerializer"},
"_aj_symbol_keys"=>["event"]}
Error:
ActiveJob::DeserializationError: Error while trying to deserialize arguments: Couldn't find GoodJob::BatchRecord with 'id'=e4b54d15-a2a4-4778-bf3c-5ce1de3f366f
This happened when I was having some DB outage issues, not sure if it's related.
The text was updated successfully, but these errors were encountered: