-
-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ActiveRecord::StatementInvalid: PG::ConnectionBad: PQsocket() can't get socket descriptor every 30 minutes aprox. #579
Comments
@bbonamin thanks for reporting this! That's really strange that you're seeing these database errors, especially so regularly. It's not unexpected that the Notifier would report a database connection error occassionally because it holds a long-running database connection. But it is unexpected that it would happen at regular intervals, which makes me think something else is going on. Is there anything happening in your application that would be on a 30-minute interval? Things that I could imagine might break the database connection:
Aside: this also makes me wonder whether Connection-related errors shouldn't be always/immediately sent to the error reporter, so they only happen when there is a persistent/long-lasting inability to connect (or maybe that itself isn't even a valid error because it's dependency related?). |
hey @bensheldon thanks for your reply!
I think this is the problem. I'm running a postgres database in fly.io and there's been another issue reporting similar behavior (but with SQLAlchemy). Apparently their default postgres configuration will kill idle connections after 30 mins, which matches the behavior I'm seeing (and the error message we're getting from ActiveRecord) I'll follow up on that thread. Sorry for taking your time! RE your aside: it's a tricky balance because I want to know about all unhandled exceptions happening in my application, but again there should probably be some kind of easy setting to mute them. There's a setting in Rollbar to ignore certain kind of exceptions, but in this case I'd prefer to fix it! |
hey @bensheldon sorry for reopening. Apparently ActiveRecord will recycle connections every 5 minutes according to the docs, by default? So perhaps as this is happening on idle connections we could add this error to good_job/lib/good_job/notifier.rb Line 36 in 0998bac
|
I did some more research on this. The What I think I want to do here is instead add additional behavior so that, if the error is a connection error, it will ignore 3 (?) consecutive errors before triggering That would reduce the monitoring noise you're seeing, while still surfacing an error if the connection problem persists. |
I am seeing App, Postgres and GoodJob worker are on different instances. It seems that I get one of these errors from Sentry, almost exactly - every 2 hours. - doing a deploy seems to refresh the clock, and an exception will normally appear with 2 minutes of a deploy.
The db-connection-graph looks like a ecg machine - spiking between 2 & 9 connections. I read : https://github.com/bensheldon/good_job#database-connections
I am not sure how to read this. - does this assume that RAILS_MAX_THREADS is for an app running on the same machine as good_job? If I am running
If you need more details - or you can give me some tips on how to investigate further - I would appreciate it. Logs below: GoodJob Worker log Aug 4 11:56:14 PM ==> Starting service with 'bundle exec good_job start --max-threads=1' app-stack-trace activerecord (7.0.3.1) lib/active_record/connection_adapters/postgresql/database_statements.rb in exec at line 48
|
@brentgreeff thanks for the logs. You've convinced me to bump this up in priority. I think the approach in my previous comment still sounds like the right one. To your other question:
Yes, that's configuration if you're running GoodJob |
Mine are happening every 2 hours - so I will have to ignore this in Sentry - but I wonder how rails / puma deals with this. - it must have the same problem to handle. |
Rails does have this problem, I think it's just much, much more rare. Example: rails/rails#29189 It happens with GoodJob because GoodJob performs the equivalent of long-running queries for I went to see how ActionCable's Postgres Adapter handles it, because that's what GoodJob's Notifier is based upon... and there is no error handling whatsoever 🤷🏻 It looks an exception will kill the adapter and it won't be restarted at all. Maybe an area for improvement in Rails, if anyone (other than me) actually uses the ActionCable Postgres Adapter. |
I just released GoodJob v3.4.1 which contains a fix for this. Please let me know if you continue to see this error. |
Hi again I am having issues on Render - LOG Aug 18 10:10:24 AM I, [2022-08-16T10:06:53.493704 #69] INFO -- : [GoodJob] Notifier unsubscribed with UNLISTEN |
I can confirm the latest release fixed this for me 👍 thank you very much! |
I just upgraded; - good_job (3.4.1)
+ good_job (3.4.3) Will keep you updated. |
Hi there! thanks for this amazing gem.
Since going to production with an application that's very low traffic and mostly idle, with a few cron tasks scheduled (every 6 hours, 12 hours, and weekly respectively), I've started seeing the exception in the title every 30 minutes being reported to Rollbar.
We're running the "async" method in production since running more than one process is non-trivial (and the load is more than fine for now).
The application works fine, scheduled tasks work fine, but we get this annoying error reported.
I have no clue as to how to even start debugging this, any ideas?
FULL BACKTRACE HERE
Strangely enough this happens exactly every 30 minutes
The text was updated successfully, but these errors were encountered: