-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postgrest does not handle temporary database failure well #742
Comments
Sort term fix void $ installHandler sigHUP (
Catch $ do
P.release pool
void . P.use pool $ do
s <- getDbStructure (toS $ configSchema conf)
liftIO $ atomicWriteIORef refDbStructure s
) Nothing After this, you can send a HUP signal to postgrest and it will recover without a restart (it compiles, i will test in a bit to make sure it actually works, no easy way to add tests for this currently because of how the tests are executed) |
Update: it works (when copy pasting you may have to pay attention to indentation of the code ) |
+1 to this fix. I personally ran into this issue when trying to up postgrest using |
@wallaceicy06 you could build your own postgrest image based on the official one and control the startup with an entry script that basically does what this article suggests in the example |
…tgREST#869) * Add connection retrying on startup and SIGHUP, Fix PostgREST#742 * Ensure that only one connection worker can run at a time * Change ConnectionError status code to 503 and add automatic connection retrying
If the database goes down temporarily, Postgrest (version 3.2.0 in my tests) begins responding to requests with
503 Service Unavailable
. However, once the database issue is resolved Postgres does not self-heal and continues serving 503 errors. This requires a manual restart of Postgrest, since in addition to not reconnecting it doesn't die either (which could trigger a restart by an external monitor.)Postgrest should handle database connectivity issues more gracefully. Simply retrying the connection every X seconds can cause high bandwidth usage for long outages if X is small, but if X is large then Postgrest doesn't quickly come back online when the database comes back online. Exponential backoff (wait X seconds, then 2X, then 4X, then 8X, etc.) fares better but long outages won't get resolved quickly. If the database is down for an hour then in the worst case Postgrest might be down for N hours, where N is the exponential growth factor. A more robust option, "cyclic exponential backoff", looks like somewhat this:
...
... repeat indefinitely
And so on. Essentially, the strategy is to never give up and never go down, although clients can specify a timeout period they're willing to wait. Postgrest could attempt retries in this manner in the background, but automatically respond with 503 to any requests that take longer than five seconds to complete.
This affects how Postgrest starts up too. Postgrest would listen to requests immediately while trying to connect to the database in the background. Until the connection is successful, all requests are met with 503. This has a handful of nice benefits:
GET /
request until you get a success code and you know Postgrest is successfully listening to requests and talking to the database. Docker 1.12+ has native support for health checks, giving you neat features like the ability to do rolling deploys of Postgrest where traffic isn't switched over to new Postgrest instances until they're healthy.The text was updated successfully, but these errors were encountered: