-
-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should I tune good job settings given my job workload of ~1 millon jobs/hr? #1595
Comments
GoodJob might not be the right tool at your scale. I always recommend Sidekiq Enterprise when talking about tens of millions of jobs That said, most of the performance tuning advice I can give based on your numbers would be the same: that’s too many threads per process. 5-15 threads is more realistic. Leave the DB pool size at 200 (the number doesn’t matter so much as it’s big). You’ll need to scale horizontally across however many processes it takes. For GoodJob specifically: pulling from the * queue is most efficient, so that’s good. You should set up GoodJob to delete job records after they perform to reduce the table size. If you can give an EXPLAIN ANALYZE of the lock query I can maybe help more. You’re pushing the scale pretty heavy. |
Thanks for the advice @bensheldon! I'm considering Sidekiq Enterprise too, but it seems like there will be quite a bit of refactoring within the codebase to effectively use the full suite of Sidekiq Enterprise features without ActiveJob (due to some limitations of using ActiveJob with Sidekiq) I'm actually using Solid Queue right now, and while it works under normal workloads, it doesn't seem too optimized for Postgres, and we're constantly seeing our CPU spike to 100%. I'm afraid the best option would be to just bite the bullet just move to Sidekiq, since I'm sure that the number of jobs I'm processing will only increase from here. Regarding the lock query, did you mean this query? SELECT
"good_job_processes".*
FROM
"good_job_processes"
LEFT JOIN
pg_locks
ON
pg_locks.locktype = $1
AND pg_locks.objsubid = $2
AND pg_locks.classid = ($3 || SUBSTR(MD5($4 || $5 || "good_job_processes"."id"::text), $6, $7))::bit(32)::int
AND pg_locks.objid = (($8 || SUBSTR(MD5($9 || $10 || "good_job_processes"."id"::text), $11, $12))::bit(64) << $13)::bit(32)::int
WHERE
("good_job_processes"."lock_type" = $14
AND "pg_locks"."locktype" IS NULL
OR "good_job_processes"."lock_type" IS NULL
AND "good_job_processes"."updated_at" < $15)
ORDER BY
"good_job_processes"."id" ASC
LIMIT
$16 If so, here's the query plan for it: ![]() |
Ah, after reading the README again, I think I might have overprovisioned my workers. Could I clarify, if I were to define |
Like Ben said, You must scale out to multiple processes, each one handling just a portion of the jobs. This also depends on how long each job lasts. Let's imagine that each job takes 1 second to complete. That means you would need 1 million seconds of CPU-time. Spread out over an hour, that would be about 278 jobs executing simultaneously (1M / 3600 = 277.87). If your jobs last about 5 seconds, you need 5 times that many jobs processing simultaneously, or about 1400 (5 000 000 ÷ 3600 = 1388.89). Each one of those threads will require a database connection. Your PostgreSQL server will need to be beefy indeed to support that many connections. In a nutshell, each process has many threads, and each thread can process one job at a time. For every Ruby line of code, only one thread at a time in the process will be executing Ruby. The others will be twiddling their thumbs. If your jobs are I/O-bound (not so easy to determine), the number of threads per process/worker could be increased, but not to "ridiculous" numbers like 98. Going back to the 1 sec/job, where we need 278 jobs executing simultaneously... With a It could be 56 machines each running 1 process, and each process running 5 threads (56 * 1 * 5 = 280 simultaneous jobs). Or, it could be 12 machines, each running 10 processes, each process running 3 threads (12 * 12 * 3 = 360 simultaneous jobs). In these last two sentences, you can replace "processes" with "workers": they mean the same thing. And the "simultaneous jobs" bit? That's the number of PG connections you'll need to support just for the workers. I hope this helped! Might I inquire as to what you're running that requires 1M jobs/hour? |
Hello, I'm looking to test how GoodJob performs under a relatively heavy workload of around a million jobs being executed every hour. My current setup has every job class to be in its own queue, and I've defaulted the
--queue
setting to be*
.I've set
GOOD_JOB_MAX_THREADS
to 98, and as advised, I've set the DB pool size to be a relatively high number of 200.Since GoodJob doesn't support pgbouncer in transaction mode, I'm connecting to the DB directly using these configurations. The number of goodjob executors I'm expecting to run in production (at maximum) is around 1800. I'm testing this against quite a beefy postgres instance but it seems like the CPU usage peaks at over 60% (for 16 cores), and the memory usage is maxxed out at 143GB (which caused the postgres instance to crash). This doesn't include the web instances that I have yet to account for, and no traffic is actually being served to this postgres db too.
I was wondering what's the best way to tune Good Job settings such that it is able to handle these number of executors, as well as web instances that will be hitting the postgres DB. Thank you!
The text was updated successfully, but these errors were encountered: