Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgrest does not handle temporary database failure well #742

Closed
jackfirth opened this issue Nov 14, 2016 · 4 comments
Closed

Postgrest does not handle temporary database failure well #742

jackfirth opened this issue Nov 14, 2016 · 4 comments

Comments

@jackfirth
Copy link

jackfirth commented Nov 14, 2016

If the database goes down temporarily, Postgrest (version 3.2.0 in my tests) begins responding to requests with 503 Service Unavailable. However, once the database issue is resolved Postgres does not self-heal and continues serving 503 errors. This requires a manual restart of Postgrest, since in addition to not reconnecting it doesn't die either (which could trigger a restart by an external monitor.)

Postgrest should handle database connectivity issues more gracefully. Simply retrying the connection every X seconds can cause high bandwidth usage for long outages if X is small, but if X is large then Postgrest doesn't quickly come back online when the database comes back online. Exponential backoff (wait X seconds, then 2X, then 4X, then 8X, etc.) fares better but long outages won't get resolved quickly. If the database is down for an hour then in the worst case Postgrest might be down for N hours, where N is the exponential growth factor. A more robust option, "cyclic exponential backoff", looks like somewhat this:

  1. Wait X seconds
  2. Wait 2X seconds
  3. Wait 4X seconds
  4. Wait 8X seconds
    ...
  5. Wait (2^N)X seconds, where N is the maximum amount of "lag time" you want between when the database comes back up and when Postgrest should come back up (ideally no more than a few minutes)
  6. Wait X seconds
  7. Wait 2X seconds
    ... repeat indefinitely

And so on. Essentially, the strategy is to never give up and never go down, although clients can specify a timeout period they're willing to wait. Postgrest could attempt retries in this manner in the background, but automatically respond with 503 to any requests that take longer than five seconds to complete.

This affects how Postgrest starts up too. Postgrest would listen to requests immediately while trying to connect to the database in the background. Until the connection is successful, all requests are met with 503. This has a handful of nice benefits:

  • No need for worrying about service startup order. It no longer matters whether you start Postgrest first or the database first.
  • It's easy to add health checking. Just fire off a GET / request until you get a success code and you know Postgrest is successfully listening to requests and talking to the database. Docker 1.12+ has native support for health checks, giving you neat features like the ability to do rolling deploys of Postgrest where traffic isn't switched over to new Postgrest instances until they're healthy.
  • Avenues for adding more complex request logic around failures and timing. The HTTP "wait" preference gives a standard way for clients to ask a server to set a deadline on how long processing the request will take. If a client isn't interested in waiting more than a second or two for a response, Postgrest can eagerly close the connection and respond with 503 when the database isn't available. Conversely, if a client is willing to wait then Postgrest can give them a longer request deadline, giving the client a greater chance of success in the face of intermittent network failures between Postgrest and the database.
@ruslantalpa
Copy link
Contributor

ruslantalpa commented Dec 6, 2016

Sort term fix
replace these lines
https://github.com/begriffs/postgrest/blob/master/main/Main.hs#L88-L92
with

  void $ installHandler sigHUP (
      Catch $ do
        P.release pool
        void . P.use pool $ do
          s <- getDbStructure (toS $ configSchema conf)
          liftIO $ atomicWriteIORef refDbStructure s
   ) Nothing

After this, you can send a HUP signal to postgrest and it will recover without a restart
docker kill -s "HUP" ${POSTGREST_CONTAINER}

(it compiles, i will test in a bit to make sure it actually works, no easy way to add tests for this currently because of how the tests are executed)

@ruslantalpa
Copy link
Contributor

ruslantalpa commented Dec 6, 2016

Update: it works (when copy pasting you may have to pay attention to indentation of the code )

@wallaceicy06
Copy link

wallaceicy06 commented Dec 29, 2016

+1 to this fix. I personally ran into this issue when trying to up postgrest using docker-compose. Because the database had not yet become healthy, postgrest never was able to serve requests even after the database did become healthy. The solution for me was to up the database before running docker-compose, but I can see this being an issue should my database randomly decide to go down at any other time besides startup.

@ruslantalpa
Copy link
Contributor

@wallaceicy06 you could build your own postgrest image based on the official one and control the startup with an entry script that basically does what this article suggests in the example
https://docs.docker.com/compose/startup-order/
the downside psql brings in a lot of dependencies and the container size is bigger.

steve-chavez added a commit to steve-chavez/postgrest that referenced this issue Apr 30, 2017
steve-chavez added a commit to steve-chavez/postgrest that referenced this issue Apr 30, 2017
monacoremo pushed a commit to monacoremo/postgrest that referenced this issue Jul 17, 2021
…tgREST#869)

* Add connection retrying on startup and SIGHUP, Fix PostgREST#742
* Ensure that only one connection worker can run at a time
* Change ConnectionError status code to 503 and add automatic connection retrying
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants