Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: what's the correct way to handle database connection pool size when using cron #443

Closed
jgrau opened this issue Nov 1, 2021 · 3 comments

Comments

@jgrau
Copy link
Contributor

jgrau commented Nov 1, 2021

Hi @bensheldon

I'm using good_job in my kubernetes setup. I run 2 worker pods that are started with bundle exec good_job start. The relevant configuration is

# application.rb
    config.good_job.enable_cron = true
    config.good_job.cron = {
      complete_completed_bookings: {
        cron: 'at midnight',
        class: 'CompleteCompletedBookingsJob',
        description:
          'Every midnight: update finished bookings: Set state to completed',
      },
      import_content_translations: {
        cron: 'every 15 minutes',
        class: 'ImportContentTranslationsJob',
        description:
          'Every 15 minutes: import content translations from crowdin',
      },
      purge_unattached_active_storage_blobs: {
        cron: 'at midnight',
        class: 'Maintenance::PurgeUnattachedActiveStorageBlobsJob',
        description: 'Every midnight: purge unattached active storage blobs',
      },
    }

# production.rb
  config.x.good_job = {
    execution_mode: :external,
    username: 'landfolk',
    password: Rails.application.credentials.good_job.fetch(:password),
  }

# database.yml
default: &default
  adapter: postgresql
  encoding: unicode
  # For details on connection pooling, see Rails configuration guide
  # https://guides.rubyonrails.org/configuring.html#database-pooling
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>

production:
  primary:
    <<: *default
    username: <%= ENV['PGUSER'] %>
    password: <%= ENV['PGPASSWORD'] %>
    database: <%= ENV['PGDATABASE'] %>
    host: <%= ENV['PGHOST'] %>
    port: <%= ENV['PGPORT'] %>
  primary_replica:
    <<: *default
    username: <%= ENV['REPLICA_PGUSER'] %>
    password: <%= ENV['REPLICA_PGPASSWORD'] %>
    database: <%= ENV['REPLICA_PGDATABASE'] %>
    host: <%= ENV['REPLICA_PGHOST'] %>
    port: <%= ENV['REPLICA_PGPORT'] %>
    replica: true

I've been seeing ActiveRecord::ConnectionTimeoutError recently. Here's a full log entry:

E, [2021-11-01T11:15:57.005688 #1] ERROR -- : [ActiveJob] Failed enqueuing ImportContentTranslationsJob to GoodJob(production_default): ActiveRecord::ConnectionTimeoutError (could not obtain a connection from the pool within 5.000 seconds (waited 5.002 seconds); all pooled connections were in use

which got me wondering if the database connection pool setup is correct.

Does the "cron" process use a connection from the pool?

I should add that I was seeing the errors at a time where a lot of jobs was being retried (using wait: :exponentially_longer) but never the less I wouldn't expect this error.

Also: I considered whether it would be better to run a separate scheduler pod that I could limit to always only have 1 replica and that I would start with --enable-cron but I then realised that it runs the cron process AND the worker process so I figured that was not the intended setup :)

@jgrau
Copy link
Contributor Author

jgrau commented Nov 1, 2021

I should also add that I have studied the readme but as I understood it, the complexity with connection pool size was with execution_mode: :async..

@bensheldon
Copy link
Owner

Does the "cron" process use a connection from the pool?

Yes. Cron runs in background thread(s), and each thread will consume a database connection from the pool.

The CronManager creates Concurrent::ScheduledTasks which run on Concurrent Ruby's global thread pool, which is implicitly sized to the number of CPU processors.

Note: It's possible to change the implementation for the CronManager to run on a dedicated and explicitly sized Thread Pool, which would give a bit more certainty to the number of database connections consumed.

At the moment, I would recommend that your database pool size = [Web/Puma threads if async] + [GoodJob execution threads] + [1 GoodJob LISTEN/NOTIFY thread] + [2 GoodJob cron threads] + 20% margin

GoodJob will check in database connections at the end of each operation (e.g. executing a job, enqueuing a cron entry), so it's possible to to run with less database connections than threads, but that leaves open the possibility that you will hit the limit under load, which will result in those database connection timeouts.

@jgrau
Copy link
Contributor Author

jgrau commented Nov 2, 2021

Thank you @bensheldon ! It makes a lot of sense and with that calculation I needed to double my database pool size. I'm deploying that change now and feel pretty confident it will resolve my issue! :) ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants