-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Makes sure jobs are eventually discarded when concurrency happens #15603
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Marcello, good, that you went through the jobs to apply concurrency limitations. I fear, though, that I found a number of things where I would handle things differently.
Apart from the comments within the code, I also think that it would make sense to apply a total_limit
of 1 to the Mails::ReminderJob
.
app/workers/notifications/create_date_alerts_notifications_job.rb
Outdated
Show resolved
Hide resolved
That was exactly why I asked for your review, as you have more context on the details of the utility of those jobs. =) |
2034838
to
e24081a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mereghost, please understand the comment as a starting point of a discussion since I really don't know which solution would be best.
|
||
retry_on GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError, | ||
wait: 5.minutes, | ||
attempts: :unlimited |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change this to apply to every almost every error. So it would have to be something like
retry_on StandardError, wait: 5.minutes, attempts: :unlimited
This has less to do with Concurrency concerns and more with the necessity of this job to be executed. On the other hand, at some point it just doesn't make sense to try again. But we don't have a pattern yet on how to handle this (e.g. by sending out a notification to an admin). So in the meantime, I would probably do
retry_on StandardError, wait: 5.minutes, attempts: 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long train of thought here (correct any misunderstandings please):
- We have a
perform_limit
of 1, meaning 1 running and infinite queued. - We have a fixed concurrency
key
. - This is the base job of 3 others, that will share the same key meaning the others will wait until the current processing job is done.
- The process of waiting will raise a concurrency error.
- We need these jobs to run, no matter why, no matter how long it takes for it finally
run.
Due to the shared key, we might run on the concurrency error frequently (I have no clue on how many times per this jobs are run), so a retry on that should take care of the exponential backoff that GoodJob
adds by default.
Retrying on StandardError
seems... off. It is too generic and might keep enqueuing a job that has bad arguments or, let's say the code is broken due to the mystical influence of Production DBs.
If we could narrow down a bit, let's say ActiveRecord
errors, PG::Errors
or so, I think it would be more viable.
(Also a place for admins see which jobs have failed and maybe retry some of them would be awesome 😛)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can follow your reasoning with the ConcurrencyExceededError
, @mereghost, so lets add the statement just you like you proposed initially.
As for the other errors, I currently just don't know which ones to expect. But without any other specification AFAIK, the jobs will just be retried 5 times so that might work for the time being anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read up on your explanation @mereghost and I can get behind it.
The PR requires rebasing but can then be merged.
|
||
retry_on GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError, | ||
wait: 5.minutes, | ||
attempts: :unlimited |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can follow your reasoning with the ConcurrencyExceededError
, @mereghost, so lets add the statement just you like you proposed initially.
As for the other errors, I currently just don't know which ones to expect. But without any other specification AFAIK, the jobs will just be retried 5 times so that might work for the time being anyways.
9142cdc
to
8c33cb5
Compare
This replicates the workaround to forever retrying jobs that @ulferts applied in #15602 to other jobs using concurrency controls.