-
-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document reliability guarantees #59
Comments
Thanks for flagging this. That's important to document better. I'll write out the way it currently works here, maybe we can have some back and forth, and then I'll capture that in the readme. And GoodJob currentlyAn ActiveJob job enqueued with GoodJob can have 3 states:
The recommendation in the readme is to add this to the base retry_on StandardError, wait: :exponentially_longer, attempts: Float::INFINITY` This has the effect of rescuing any Lines 81 to 101 in 57cc7bf
When errors do bubble up to the GoodJob adapter:
ActiveJob is complicatedThe core challenge here is that the interaction of ActiveJob and the backend can be ambiguous. When the Where to go from hereHaving written all of this, I think I've changed my mind about how GoodJob should work in the default state.
Your issue also triggered some more research and it seems that currently What needs more details?
|
@dv I've updated the README and made the behavior safer by default. Please let me know if it answers your questions. Thanks again! |
I have a question. The README states "GoodJob guarantees at-least-once performance of jobs". What if I want "once and once only"? I have jobs that send messages, and I never want them to be performed more or less than once. |
@DannyBen that's a very good question that unfortunately falls into the gap between the backend (like GoodJob, Sidekiq, Que, etc) and how you write your own job's GoodJob guarantees that a completely performed job (e.g. no exception is raised in the middle of execution) will run once and only once. But can only guarantee that any particular step of the business logic within your job will be performed "at-least-once". "Idempotency" is something all backends try to answer: |
Thanks for replying.
I am not sure it is. At least to me, a job means all the code invoked by the Job's
That is great to know. This perhaps is the part that can be added to the README. I already assumed this was the case, but the text about "at least once" confused me. As for other job "steps" that might run "at least once" - that is completely understandable, assuming I understand it correctly: If a job fails at "line 3" of its |
@DannyBen thank you for the feedback! I’ll integrate that into the Readme. |
Thank you @bensheldon, you've answered my questions and more. I'll let you close this issue in case you want to continue this discussion with the other participants. |
Thank you everyone for the feedback and discussion! 🙌 This has been very helpful for improving GoodJob's documentation and my own understanding. |
What reliability specs is
good_job
built for? Is there a chance of jobs getting "lost" in certain (edge) cases such as workers being killed by the host, or network connections dieing, or is there a guarantee of at-least-once or exactly-once?Your README right now mentions "run-once safety" but I'm not sure if you mean "exactly once" or "at least once", or if that's not a guarantee at all, since it's not explicit.
Also, it'd be great to know what behaviour to expect when a worker dies. I.e. does the in-progress job become available immediately for another worker to pick up, after the worker goes away, is there a timeout involved, does it require a cleanup process, etc...
In general, a paragraph in your documentation on the reliability of the jobs would be great. As it stands, I'm a bit hesitant to use it in production without having a clear idea of what behaviour to expect in failure cases.
The text was updated successfully, but these errors were encountered: