Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cron scheduler and multiple processes #1128

Closed
olivier-thatch opened this issue Oct 30, 2023 · 2 comments
Closed

Cron scheduler and multiple processes #1128

olivier-thatch opened this issue Oct 30, 2023 · 2 comments

Comments

@olivier-thatch
Copy link

Hello,

We're currently evaluating switching from Sidekiq to GoodJob, and I'd like to get some clarity on GoodJob's exact behavior when using the cron scheduler and multiple processes.

Right now we're using sidekiq-cron for scheduled jobs. sidekiq-cron has a built-in mechanism to ensure scheduled jobs only get enqueued once even when running multiple processes, but it is reliant on a Redis feature:

Sidekiq-Cron is safe to use with multiple Sidekiq processes or nodes. It uses a Redis sorted set to determine that only the first process who asks can enqueue scheduled jobs into the queue.

GoodJob's README seems to imply that it should be safe to have multiple processes running with the scheduler enabled:

GoodJob's cron uses unique indexes to ensure that only a single job is enqueued at the given time interval.

but then the configuration example below specifically enables scheduling on a single process:

# Enable cron in this process, e.g., only run on the first Heroku worker process
config.good_job.enable_cron = ENV['DYNO'] == 'worker.1' # or `true` or via $GOOD_JOB_ENABLE_CRON

Unfortunately our hosting provider doesn't have an equivalent to Heroku's $DYNO. Their recommendation was to decouple scheduling from the background worker service, i.e. have the worker service (Sidekiq/GoodJob) run without scheduling, and a separate service that is only responsible for enqueued scheduled jobs. This way the worker service can be scaled as needed and the scheduled job service can run on a single instance. This would be a bit annoying to implement and kind of a waste since the scheduled job service will do nothing 99% of the time.

Am I just worrying over nothing? Can we simply scale GoodJob over multiple instances without having to worry about scheduled jobs being enqueued multiple times and running concurrently?

@bensheldon
Copy link
Owner

You don't need to worry, and I should rewrite that documentation to be a little clearer. Something like:

GoodJob's cron is safe to use with multiple processes or containers. GoodJob ensures only a single job is enqueued by placing a unique compound index on the jobs table that prevents the insertion of duplicate job records with the same cron configuration key and scheduled time ([cron_key, cron_at]).

The additional stuff is more like:

While entirely optional, if you did want to reduce duplicate key collisions from noisying up your logs, you could...

...though I should probably just remove that entirely.

@olivier-thatch
Copy link
Author

Perfect, thanks for the quick reply! I'm looking forward to switching to GoodJob 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants