Skip to content

Troubleshooting

akshat edited this page Nov 7, 2022 · 4 revisions

When running Goose in Production, you might run into some issues. Here's a playbook for them:

Latency goes up

  • Latency is defined as the time taken to transmit data to/from the processing component
    • execution.latency: time between enqueue -> start of execution
    • scheduled.latency: time between theoretical schedule time -> start execution
  • When latency goes up, first check system level metrics of Message Brokers & verify their performance
  • If Message Brokers are fine, check Job Enqueue rate, Failure rate & Execution time
    • Consider scaling up the number of workers if Jobs are getting enqueued at a higher rate than normal
    • If Failure rate or Execution times are unusual, investigate issue with code/third party APIs
  • If scheduling latency is high, consider lowering scheduler-polling-interval-sec in Redis

Workers keep crashing

  • A Job might be causing process crashes
  • To find the Poison Job, track jobs.recovered metric & look for :function tag
    • If a Job causes workers to crash repeatedly, it'll be recovered & tagged by Goose

Previous: Redis Production Readiness        Next: Glossary

Clone this wiki locally