Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cron jobs can be missed if deploys are timed just right #1484

Closed
tjefferson08 opened this issue Sep 11, 2024 · 7 comments · Fixed by #1488
Closed

Cron jobs can be missed if deploys are timed just right #1484

tjefferson08 opened this issue Sep 11, 2024 · 7 comments · Fixed by #1488

Comments

@tjefferson08
Copy link

TL;DR

  • we deployed our app almost exactly at noon ET, and
  • our cron scheduled for noon did not execute

just glancing at the code in CronManager, this seems possible (albeit unlikely), depending on very specific timing

  • old worker is shut down with cron JUST about to be run
  • new worker spins up and starts cron JUST after cron schedule was due

I would think in a blue-green deployment world, this would not be possible 🤔 (but apparently our heroku worker deployments are not blue-green)

@bensheldon
Copy link
Owner

Briefly, I'm imagining that there could be a configuration like config.cron_start_lookback = 5.minutes where it would see if there was a previous enqueue that would have fallen within that time period, and if so, attempt to enqueue the job (with all of the existing uniqueness protections).

I also wonder if it should have a similar lookahead where it's like "if the cron will start in the next [lookback-period] don't bother either" because I imagine if the job is being run every 1 minute or something rapidly like that, it shouldn't be necessary. It's more when the cron entry is like once per day that it's a problem, I imagine.

@tjefferson08
Copy link
Author

Sounds like it'd work! Seems the time window value would be determined by the longest possible downtime you'd take during deployment (I suppose ideally this is 0)

Ok obviously huge change... but would it be totally nuts to try and run cron scheduling through ActiveJob's API?

Job.set(wait_until: next_cron_at).perform_later

Altho hm I guess you'd have to wrap the job somehow so you could tell it to reschedule for the next occurrence after executing.

It'd be cool if the cron manager could just boot up, schedule upcoming crons, and be done... possibly even removing the need for the dedicated thread?

@bensheldon
Copy link
Owner

The way that GoodJob's cron works is intentional, but it is a design decision.

GoodJob Cron's design is to be analogous to Unix System Cron, which differentiates between the scheduler and the task. That makes it possible to enqueue jobs at every scheduled time, and not just at the time the that a job (eventually) executes. That separation also makes it easy to cancel or change the schedule without having to search for a waiting job and then find/modify/destroy it.

Also:

  • I generally consider job execution to be in Application-land. So GoodJob can be relied upon to enqueue a job at the scheduled time, but it's up to the application/operation-configuration to ensure there are execution resources to perform it.
  • I discourage using scheduled jobs for anything other than error handling/incremental backoff. I discourage putting jobs in the queue that are expected to execute far into the future (or "smearing" jobs to spread out their execution). GoodJob's job table is optimized to function as a queue rather than a general data-store

Sorry, I know that's TMI, but wanted to explain why it is the way it is 😄

Maybe I'd do it differently if I did it again, but only maybe. I think Solid Queue's implementation is interesting/different because it does put it all in the database (whereas GoodJob generally tries to keep stuff in configuration): https://github.com/rails/solid_queue/blob/main/test/dummy/db/schema.rb#L97

@eleith
Copy link

eleith commented Sep 12, 2024

thanks for the informative response!

this helps us evolve our approach both to how we deploy our workers as well as how we queue up our jobs.

@tjefferson08
Copy link
Author

yes, thanks for the explanation 🙏 much appreciated

and thanks for good_job, we love it!

@bensheldon
Copy link
Owner

@jjb helpfully shared an example in sidekiq-cron, which has nice naming ("reschedule_grace_period"): sidekiq-cron/sidekiq-cron#465

Also TIL about Fugit "within": floraison/fugit@89d102d

@bensheldon
Copy link
Owner

I have a PR up for this in #1488

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
3 participants