Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makes sure jobs are eventually discarded when concurrency happens #15603

Merged
merged 4 commits into from
May 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions app/workers/mails/reminder_job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@

class Mails::ReminderJob < Mails::DeliverJob
include ::Notifications::WithMarkedNotifications
include GoodJob::ActiveJobExtensions::Concurrency

good_job_control_concurrency_with(
total_limit: 1,
key: -> { "#{self.class.name}-#{arguments.last}" }
)

private

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,13 @@

module Notifications
class CreateDateAlertsNotificationsJob < ApplicationJob
include GoodJob::ActiveJobExtensions::Concurrency

good_job_control_concurrency_with(
total_limit: 1,
key: -> { "#{self.class.name}-#{arguments.last}" }
)

def perform(user)
return unless EnterpriseToken.allows_to?(:date_alerts)

Expand Down
4 changes: 4 additions & 0 deletions app/workers/work_packages/progress/job.rb
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,8 @@ class WorkPackages::Progress::Job < ApplicationJob
perform_limit: 1,
key: -> { "WorkPackagesProgressJob" }
)

retry_on GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError,
wait: 5.minutes,
attempts: :unlimited
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this to apply to every almost every error. So it would have to be something like

retry_on StandardError, wait: 5.minutes, attempts: :unlimited

This has less to do with Concurrency concerns and more with the necessity of this job to be executed. On the other hand, at some point it just doesn't make sense to try again. But we don't have a pattern yet on how to handle this (e.g. by sending out a notification to an admin). So in the meantime, I would probably do

retry_on StandardError, wait: 5.minutes, attempts: 3

Copy link
Contributor Author

@mereghost mereghost May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long train of thought here (correct any misunderstandings please):

  1. We have a perform_limit of 1, meaning 1 running and infinite queued.
  2. We have a fixed concurrency key.
  3. This is the base job of 3 others, that will share the same key meaning the others will wait until the current processing job is done.
  4. The process of waiting will raise a concurrency error.
  5. We need these jobs to run, no matter why, no matter how long it takes for it finally
    run.

Due to the shared key, we might run on the concurrency error frequently (I have no clue on how many times per this jobs are run), so a retry on that should take care of the exponential backoff that GoodJob adds by default.

Retrying on StandardError seems... off. It is too generic and might keep enqueuing a job that has bad arguments or, let's say the code is broken due to the mystical influence of Production DBs.

If we could narrow down a bit, let's say ActiveRecord errors, PG::Errors or so, I think it would be more viable.

(Also a place for admins see which jobs have failed and maybe retry some of them would be awesome 😛)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can follow your reasoning with the ConcurrencyExceededError, @mereghost, so lets add the statement just you like you proposed initially.

As for the other errors, I currently just don't know which ones to expect. But without any other specification AFAIK, the jobs will just be retried 5 times so that might work for the time being anyways.

end
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@ class HealthStatusMailerJob < ApplicationJob
key: -> { "#{self.class.name}-#{arguments.last[:storage].id}" }
)

retry_on GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError,
wait: 5.minutes,
attempts: 3

discard_on ActiveJob::DeserializationError

def perform(storage:)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,11 @@ class ManageStorageIntegrationsJob < ApplicationJob
enqueue_limit: 1,
perform_limit: 1
)

retry_on GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError,
wait: 5.minutes,
attempts: 3

SINGLE_THREAD_DEBOUNCE_TIME = 4.seconds.freeze
KEY = :manage_nextcloud_integration_job_debounce_happened_at
CRON_JOB_KEY = :"Storages::ManageStorageIntegrationsJob"
Expand Down
Loading