Skip to content

Production Readiness

akshat edited this page Oct 19, 2023 · 8 revisions

We are firm believers in Murphy's law. Here's a checklist to ensure your app runs flawlessly in production:

  • All jobs should be idempotent i.e. multiple executions should have the same effect
    • Due to errors in code or even network/infra, Jobs might get executed more than once
    • Jobs get executed at-least-once, not exactly-once
  • Jobs should be concurrency & parallelism friendly
  • Don't rely on ordered execution
    • Because of multiple worker processes & variable job processing times, it cannot be guaranteed jobs will be executed in the order they were enqueued
    • To achieve ordered execution, one can apply a hack of singular worker process with 1 thread. However, a job failure might still lead to out of order executions
  • Handlers should not have side-effects
    • Functions like :retry-delay-sec-fn-sym, :error-handler-fn-sym, :return-listener shouldn't have side-effects
  • Avoid law of diminishing returns
    • For a worker, higher threads doesn't necessary imply better throughput & Job execution rates
    • Set the thread-count based on CPU cores & RAM, by trial & error during load-testing
  • Configure sane timeouts, retry-limits and back-offs
  • Set :graceful-shutdown-sec to 2x of p90 Job execution times
    • Graceful shutdown timeout isn't the minimum, but maximum amount of time a worker will wait before shutting down
    • If all Jobs finish execution in 5 seconds, the worker will exit after that even if shutdown time is 30 seconds
    • Modern cloud systems allow shutdown times upto 10 minutes. Leverage that to benefit your Jobs
  • When a worker crashes abruptly, Goose will automatically recover & retry abandoned Jobs. You need not worry about that
  • Log at warn or error levels to avoid clutter
    • Goose uses tools.logging to give control & autonomy of logging library to users

Message Brokers

Prepare your message broker for production usage.


Previous: Serializing Custom data-types        Next: RabbitMQ Production Readiness

Clone this wiki locally