Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makes sure jobs are eventually discarded when concurrency happens #15603

Merged
merged 4 commits into from
May 27, 2024

Conversation

mereghost
Copy link
Contributor

@mereghost mereghost commented May 17, 2024

⚠️ Related Work Package OP#54984

This replicates the workaround to forever retrying jobs that @ulferts applied in #15602 to other jobs using concurrency controls.

@mereghost mereghost requested a review from ulferts May 17, 2024 12:49
Copy link
Contributor

@ulferts ulferts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Marcello, good, that you went through the jobs to apply concurrency limitations. I fear, though, that I found a number of things where I would handle things differently.

Apart from the comments within the code, I also think that it would make sense to apply a total_limit of 1 to the Mails::ReminderJob.

app/workers/work_packages/apply_working_days_change_job.rb Outdated Show resolved Hide resolved
app/workers/work_packages/progress/job.rb Outdated Show resolved Hide resolved
@mereghost
Copy link
Contributor Author

Hi Marcello, good, that you went through the jobs to apply concurrency limitations. I fear, though, that I found a number of things where I would handle things differently.

That was exactly why I asked for your review, as you have more context on the details of the utility of those jobs. =)

@mereghost mereghost requested a review from ulferts May 21, 2024 08:28
@mereghost mereghost force-pushed the bug/ensure_discard_on_concurrency_error branch from 2034838 to e24081a Compare May 21, 2024 08:29
Copy link
Contributor

@ulferts ulferts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mereghost, please understand the comment as a starting point of a discussion since I really don't know which solution would be best.


retry_on GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError,
wait: 5.minutes,
attempts: :unlimited
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this to apply to every almost every error. So it would have to be something like

retry_on StandardError, wait: 5.minutes, attempts: :unlimited

This has less to do with Concurrency concerns and more with the necessity of this job to be executed. On the other hand, at some point it just doesn't make sense to try again. But we don't have a pattern yet on how to handle this (e.g. by sending out a notification to an admin). So in the meantime, I would probably do

retry_on StandardError, wait: 5.minutes, attempts: 3

Copy link
Contributor Author

@mereghost mereghost May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long train of thought here (correct any misunderstandings please):

  1. We have a perform_limit of 1, meaning 1 running and infinite queued.
  2. We have a fixed concurrency key.
  3. This is the base job of 3 others, that will share the same key meaning the others will wait until the current processing job is done.
  4. The process of waiting will raise a concurrency error.
  5. We need these jobs to run, no matter why, no matter how long it takes for it finally
    run.

Due to the shared key, we might run on the concurrency error frequently (I have no clue on how many times per this jobs are run), so a retry on that should take care of the exponential backoff that GoodJob adds by default.

Retrying on StandardError seems... off. It is too generic and might keep enqueuing a job that has bad arguments or, let's say the code is broken due to the mystical influence of Production DBs.

If we could narrow down a bit, let's say ActiveRecord errors, PG::Errors or so, I think it would be more viable.

(Also a place for admins see which jobs have failed and maybe retry some of them would be awesome 😛)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can follow your reasoning with the ConcurrencyExceededError, @mereghost, so lets add the statement just you like you proposed initially.

As for the other errors, I currently just don't know which ones to expect. But without any other specification AFAIK, the jobs will just be retried 5 times so that might work for the time being anyways.

Copy link
Contributor

@ulferts ulferts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read up on your explanation @mereghost and I can get behind it.

The PR requires rebasing but can then be merged.


retry_on GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError,
wait: 5.minutes,
attempts: :unlimited
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can follow your reasoning with the ConcurrencyExceededError, @mereghost, so lets add the statement just you like you proposed initially.

As for the other errors, I currently just don't know which ones to expect. But without any other specification AFAIK, the jobs will just be retried 5 times so that might work for the time being anyways.

@mereghost mereghost force-pushed the bug/ensure_discard_on_concurrency_error branch from 9142cdc to 8c33cb5 Compare May 27, 2024 09:24
@mereghost mereghost merged commit 9b09a13 into dev May 27, 2024
9 checks passed
@mereghost mereghost deleted the bug/ensure_discard_on_concurrency_error branch May 27, 2024 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants