Postgrest does not handle temporary database failure well #742

jackfirth · 2016-11-14T08:50:40Z

If the database goes down temporarily, Postgrest (version 3.2.0 in my tests) begins responding to requests with 503 Service Unavailable. However, once the database issue is resolved Postgres does not self-heal and continues serving 503 errors. This requires a manual restart of Postgrest, since in addition to not reconnecting it doesn't die either (which could trigger a restart by an external monitor.)

Postgrest should handle database connectivity issues more gracefully. Simply retrying the connection every X seconds can cause high bandwidth usage for long outages if X is small, but if X is large then Postgrest doesn't quickly come back online when the database comes back online. Exponential backoff (wait X seconds, then 2X, then 4X, then 8X, etc.) fares better but long outages won't get resolved quickly. If the database is down for an hour then in the worst case Postgrest might be down for N hours, where N is the exponential growth factor. A more robust option, "cyclic exponential backoff", looks like somewhat this:

Wait X seconds
Wait 2X seconds
Wait 4X seconds
Wait 8X seconds
...
Wait (2^N)X seconds, where N is the maximum amount of "lag time" you want between when the database comes back up and when Postgrest should come back up (ideally no more than a few minutes)
Wait X seconds
Wait 2X seconds
... repeat indefinitely

And so on. Essentially, the strategy is to never give up and never go down, although clients can specify a timeout period they're willing to wait. Postgrest could attempt retries in this manner in the background, but automatically respond with 503 to any requests that take longer than five seconds to complete.

This affects how Postgrest starts up too. Postgrest would listen to requests immediately while trying to connect to the database in the background. Until the connection is successful, all requests are met with 503. This has a handful of nice benefits:

No need for worrying about service startup order. It no longer matters whether you start Postgrest first or the database first.
It's easy to add health checking. Just fire off a GET / request until you get a success code and you know Postgrest is successfully listening to requests and talking to the database. Docker 1.12+ has native support for health checks, giving you neat features like the ability to do rolling deploys of Postgrest where traffic isn't switched over to new Postgrest instances until they're healthy.
Avenues for adding more complex request logic around failures and timing. The HTTP "wait" preference gives a standard way for clients to ask a server to set a deadline on how long processing the request will take. If a client isn't interested in waiting more than a second or two for a response, Postgrest can eagerly close the connection and respond with 503 when the database isn't available. Conversely, if a client is willing to wait then Postgrest can give them a longer request deadline, giving the client a greater chance of success in the face of intermittent network failures between Postgrest and the database.

The text was updated successfully, but these errors were encountered:

ruslantalpa · 2016-12-06T07:49:41Z

Sort term fix
replace these lines
https://github.com/begriffs/postgrest/blob/master/main/Main.hs#L88-L92
with

  void $ installHandler sigHUP (
      Catch $ do
        P.release pool
        void . P.use pool $ do
          s <- getDbStructure (toS $ configSchema conf)
          liftIO $ atomicWriteIORef refDbStructure s
   ) Nothing

After this, you can send a HUP signal to postgrest and it will recover without a restart
docker kill -s "HUP" ${POSTGREST_CONTAINER}

(it compiles, i will test in a bit to make sure it actually works, no easy way to add tests for this currently because of how the tests are executed)

ruslantalpa · 2016-12-06T09:07:56Z

Update: it works (when copy pasting you may have to pay attention to indentation of the code )

wallaceicy06 · 2016-12-29T06:15:19Z

+1 to this fix. I personally ran into this issue when trying to up postgrest using docker-compose. Because the database had not yet become healthy, postgrest never was able to serve requests even after the database did become healthy. The solution for me was to up the database before running docker-compose, but I can see this being an issue should my database randomly decide to go down at any other time besides startup.

ruslantalpa · 2016-12-29T11:41:03Z

@wallaceicy06 you could build your own postgrest image based on the official one and control the startup with an entry script that basically does what this article suggests in the example
https://docs.docker.com/compose/startup-order/
the downside psql brings in a lot of dependencies and the container size is bigger.

…tgREST#869) * Add connection retrying on startup and SIGHUP, Fix PostgREST#742 * Ensure that only one connection worker can run at a time * Change ConnectionError status code to 503 and add automatic connection retrying

begriffs added bug QOS labels Nov 14, 2016

steve-chavez added a commit to steve-chavez/postgrest that referenced this issue Apr 30, 2017

Add connection retrying on startup and SIGHUP, Fix PostgREST#742

3accb8d

steve-chavez added a commit to steve-chavez/postgrest that referenced this issue Apr 30, 2017

Add connection retrying on startup and SIGHUP, Fix PostgREST#742

fa37faf

jackfirth mentioned this issue May 1, 2017

Add connection retrying on startup and SIGHUP, Fix #742 #869

Merged

begriffs closed this as completed in 3e26c1a May 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgrest does not handle temporary database failure well #742

Postgrest does not handle temporary database failure well #742

jackfirth commented Nov 14, 2016 •

edited

Loading

ruslantalpa commented Dec 6, 2016 •

edited

Loading

ruslantalpa commented Dec 6, 2016 •

edited

Loading

wallaceicy06 commented Dec 29, 2016 •

edited

Loading

ruslantalpa commented Dec 29, 2016

Postgrest does not handle temporary database failure well #742

Postgrest does not handle temporary database failure well #742

Comments

jackfirth commented Nov 14, 2016 • edited Loading

ruslantalpa commented Dec 6, 2016 • edited Loading

ruslantalpa commented Dec 6, 2016 • edited Loading

wallaceicy06 commented Dec 29, 2016 • edited Loading

ruslantalpa commented Dec 29, 2016

jackfirth commented Nov 14, 2016 •

edited

Loading

ruslantalpa commented Dec 6, 2016 •

edited

Loading

ruslantalpa commented Dec 6, 2016 •

edited

Loading

wallaceicy06 commented Dec 29, 2016 •

edited

Loading