Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle database connection drops #296

Closed
nelvalx opened this issue Jul 18, 2021 · 6 comments · Fixed by #301
Closed

Handle database connection drops #296

nelvalx opened this issue Jul 18, 2021 · 6 comments · Fixed by #301

Comments

@nelvalx
Copy link

nelvalx commented Jul 18, 2021

If the database connection is lost, good_job will be crazy looping with the error:

E, [2021-07-18 07:08:22 #589] ERROR -- : [GoodJob] Notifier errored: could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 5432?

E, [2021-07-18 07:08:22 #589] ERROR -- : [GoodJob] Notifier errored: could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 5432?

E, [2021-07-18 07:08:22 #589] ERROR -- : [GoodJob] Notifier errored: could not connect to server: Connection refused
        Is the server running on host "localhost" (127.0.0.1) and accepting
        TCP/IP connections on port 5432?

...

If the connection to postgresql is inexistent before start good_job, when that connection become available good_job has the following error when polling:

I, [2021-07-18 07:21:37 #589]  INFO -- : [GoodJob] Notifier subscribed with LISTEN
E, [2021-07-18 07:21:42 #589] ERROR -- : [GoodJob] GoodJob error: PG::SyntaxError: ERROR:  zero-length delimited identifier at or near """"
LINE 1: ...ory_lock(('x' || substr(md5('good_jobs' || "rows".""::text),...
...                                                             ^
@bensheldon
Copy link
Owner

@nelvalx thanks for reporting this! GoodJob can handle this state better.

I think the solution is to detect the condition and add delay before GoodJob tries to reconnect again. And provide a better log message.

Do you think 15 seconds is an appropriate delay before retrying?

I'll have to do some research about how ActiveRecord manages connection lifecycle. The GoodJob::Notifier was modeled on ActionCable's PG adapter, and it may have a solution or suffer from the same problem: https://github.com/rails/rails/blob/83217025a171593547d1268651b446d3533e2019/actioncable/lib/action_cable/subscription_adapter/postgresql.rb

@nelvalx
Copy link
Author

nelvalx commented Jul 19, 2021

For now I have a 30 seconds delay, but I agree that 15 seconds is a good average.

This is how I patch the configuration until it's solved. The other 5 seconds delay is preventing 100% cpu usage over any unexpected error.

GoodJob.on_thread_error = lambda do |exception|
  # https://github.com/bensheldon/good_job/issues/296
  case exception
  when ActiveRecord::ConnectionNotEstablished, PG::UnableToSend
    sleep(30)
  when ActiveRecord::StatementInvalid
    GoodJob::Job.advisory_lockable_column = 'id' if GoodJob::Job._advisory_lockable_column.blank?
  end

  sleep(5)
end

@bensheldon
Copy link
Owner

@nelvalx I'm curious, are you seeing this in production? Could you give me a bit more context about how this happens?

(I initially imagined that this was what happens to me: I'm doing development and forget to start the database process.)

@nelvalx
Copy link
Author

nelvalx commented Jul 19, 2021

I'm in development.
I'll need to have few background jobs (max 20/day) in a rails app and I was searching for a adapter and found Good Job 😄
Simulating its behavior on datababe connection lost I found this error.

If you shutdown the database after rails start you only get the error loops while the database is down.
If you shutdown the database before rails start you get the error loops while the database is down, but when the connection is back GoodJob::Job does not know its advisory_lockable_column and throws a SQL error when polling.

My current configuration:

# app/jobs/application_job.rb
class ApplicationJob < ActiveJob::Base
  # Automatically retry jobs that encountered a deadlock
  # retry_on ActiveRecord::Deadlocked

  retry_on StandardError, wait: :exponentially_longer, attempts: 20

  # Most jobs are safe to ignore if the underlying records are no longer available
  discard_on ActiveJob::DeserializationError
end
# config/initializers/good_job.rb
GoodJob.retry_on_unhandled_error = false
GoodJob.preserve_job_records = true

GoodJob.on_thread_error = lambda do |exception|
  # https://github.com/bensheldon/good_job/issues/296
  case exception
  when ActiveRecord::ConnectionNotEstablished, PG::UnableToSend
    sleep(30)
  when ActiveRecord::StatementInvalid
    GoodJob::Job.advisory_lockable_column = 'id' if GoodJob::Job._advisory_lockable_column.blank?
  end

  sleep(5)
end
# config/application.rb
...
    # Active Job adapter
    config.active_job.queue_adapter = :good_job
    config.good_job.execution_mode = :async_server
    config.good_job.max_threads = 2
    config.good_job.poll_interval = 5.minutes.to_i
    config.good_job.shutdown_timeout = 25.seconds.to_i
...

@bensheldon
Copy link
Owner

@nelvalx I just released a fix for this in v1.11.2. Thank you for reporting it 🎉

@nsilva-ta
Copy link

If I break the database connection (with a service postgresql restart):

[GoodJob] [13272] GoodJob started scheduler with queues=* max_threads=2.
[GoodJob] Notifier subscribed with LISTEN
FATAL:  terminating connection due to administrator command
[GoodJob] Notifier unsubscribed with UNLISTEN
[GoodJob] [13272] [GoodJob::Scheduler(queues=* max_threads=2)-worker-2] Executed GoodJob 759a041c-9fcb-4656-b2bb-04ea4c8b0243
[GoodJob] [13272] [GoodJob::Scheduler(queues=* max_threads=2)-worker-3] Executed GoodJob 4b2388f5-a31f-41f2-a55a-31fc28615076
[GoodJob] [13272] [GoodJob::Scheduler(queues=* max_threads=2)-worker-2] Executed GoodJob 9aa9c6bc-3a53-4edc-9a5e-a29b0e6205a2
[GoodJob] [13272] [GoodJob::Scheduler(queues=* max_threads=2)-worker-3] Executed GoodJob 833f2c60-da32-4c4a-b3f9-36abd1c15702
[GoodJob] [13272] [GoodJob::Scheduler(queues=* max_threads=2)-worker-2] Executed GoodJob 276fdbf6-4374-4c44-bcb5-770cd0066b87
[GoodJob] [13272] [GoodJob::Scheduler(queues=* max_threads=2)-worker-3] Executed GoodJob 781d2302-d838-429a-8591-b3cd81bcb402

The notifier is still unsubscribed! Only polling is working but executing only one job at the time.
With max_threads=2 the log shows a worker-3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants