Support graceful shutdown of haproxy #156

drewrobb · 2015-08-25T02:05:47Z

The purpose of this feature is to allow bamboo to shutdown haproxy gracefully in response to a SIGTERM. In my particular use case we have bamboo running in docker behind an aws ELB. The goal is to generate a health check that can remove bamboo from the ELB before bamboo actually exits, so that we can redeploy bamboo without any requests being lost. My particular way of shutting down bamboo is to run docker stop on the container. By default this gives a SIGTERM followed by a SIGKILL 10 seconds later, so a value of GraceSeconds < 10s is reasonable, but the value should be large enought for an upstream balancer to detect that bamboo is unhealthy. Some changes to the dockerization were necessary so that bamboo would actually get the signal-- child processes of bash or sh need to be run with 'exec'.

I've been testing this by running the container, then running something like:

while true; do curl --connect-timeout 2 --max-time 2 localhost:2000/health  -sL -w "%{http_code} %{time_total}  " -o /dev/null; echo $(($(date +%s%N)/1000000)); sleep 0.2; done

And then running docker stop $(docker ps | grep bamboo | awk '{print $1}') the http status should change from 200 to 503 for 5 seconds.

I'm not sure if people would want this on by default, but GraceSeconds is configurable and setting to 0 allows immediate exit. Also, port 2000 is used for health checking. This could be problematic if not running in docker, so maybe my changes to the haproxy_template should be commented out by default.

drewrobb · 2015-08-25T02:10:26Z

config/haproxy_template.cfg

@@ -34,6 +36,24 @@ defaults
        errorfile 504 /etc/haproxy/errors/504.http


+frontend graceful_stop_check


I couldn't think of a way to get rid of this extra frontend just for checking if shutdown is happening

drewrobb · 2015-08-25T19:26:15Z

There is a tiny issue here-- when using GraceSeconds, the old haproxy process after a restart will continue to bind on port 80. The kernel will distribute requests between processes in this case rather than send to newest process as we would want. If servers change sufficiently quickly, you might get 503s. I'm looking at a work around sending SIGTTOU and SIGUSR1 to the old haproxy PID to force it to unbind after restarting with -sf option. The haproxy docs say that this should be necessary, but I'm seeing otherwise.

j1n6 · 2015-08-31T12:30:38Z

This is an interesting and valid use case. The only concern I have is avoid Bamboo shutting down HAProxy, it would help with upgrading and maintenance.

timoreimann · 2015-08-31T17:29:56Z

@drewrob:, IIUC, your intention is to facilitate a way to disable Bamboo smoothly for maintenance reasons without any downtime involved. Just wondering whether you could tell ELB to take whatever Bamboo/HAProxy combo you want to run maintenance on out of balancing, thus avoiding any Bamboo-stopping-HAProxy control flows.

I am no way familiar with ELB so let me know if there's a blocker on the AWS end I am missing.

drewrobb · 2015-08-31T18:30:32Z

@timoreimann, yes that is my intention. Your idea would work as well, I wanted to implement it this way so that I didn't have to worry about that process. In fact I'm running bamboo on marathon as well (on a subset of mesos slaves), so I don't have any special procedure to decommission a mesos slave.

@activars it would be possible to have the signal handler only shutdown haproxy on a SIGTERM, and just shutdown bamboo on a SIGINT (although that convention would be a bit weird?). Another idea-- have grace seconds = -1 by default and in that case don't shutdown haproxy, just shutdown bamboo?

timoreimann · 2015-08-31T18:50:47Z

@drewrobb: How do you make sure that you do not lose any requests when Bamboo shuts down HAProxy (presumably gracefully) on the load balancer end? Does ELB come with some kind of mechanism to retransmit packets to other hosts if one is deemed unavailable?

drewrobb · 2015-08-31T19:02:31Z

@timoreimann I use the /health endpoint as defined in this PR as a health check for the ELB, with settings such that it will be marked unhealthy in less than GraceSeconds as defined here. I also made sure that the mesos setting docker_top_timeout is large enough. Thus, the ELB will stop sending requests to bamboo well before it has shutdown. Important to note that during the shutdown process, the bamboo instance will keep handling requests as usual, it just will stop getting new requests from the ELB once marked unhealthy. This approach wouldn't work for long running connections such as websockets, but any request that takes less than some amount of time (GraceSeconds minus time it takes for bamboo to be marked unhealthy).

mlerner · 2016-03-01T03:37:43Z

This would be great to have, @drewrobb!

KidkArolis · 2016-08-24T15:16:17Z

Cleaning up old PRs, feel free to reopen if still relevant.

Support graceful shutdown of haproxy

73eb565

drewrobb reviewed Aug 25, 2015
View reviewed changes

drewrobb added 2 commits August 24, 2015 19:17

Parse settings from ENV

bfc8add

Stopwaitsecs

1a678c1

Simpler healthcheck

6c30dcf

KidkArolis closed this Aug 24, 2016

j1n6 reopened this Sep 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support graceful shutdown of haproxy #156

Support graceful shutdown of haproxy #156

drewrobb commented Aug 25, 2015

drewrobb Aug 25, 2015

drewrobb commented Aug 25, 2015

j1n6 commented Aug 31, 2015

timoreimann commented Aug 31, 2015

drewrobb commented Aug 31, 2015

timoreimann commented Aug 31, 2015

drewrobb commented Aug 31, 2015

mlerner commented Mar 1, 2016

KidkArolis commented Aug 24, 2016

		@@ -34,6 +36,24 @@ defaults
		errorfile 504 /etc/haproxy/errors/504.http


		frontend graceful_stop_check

Support graceful shutdown of haproxy #156

Are you sure you want to change the base?

Support graceful shutdown of haproxy #156

Conversation

drewrobb commented Aug 25, 2015

drewrobb Aug 25, 2015

Choose a reason for hiding this comment

drewrobb commented Aug 25, 2015

j1n6 commented Aug 31, 2015

timoreimann commented Aug 31, 2015

drewrobb commented Aug 31, 2015

timoreimann commented Aug 31, 2015

drewrobb commented Aug 31, 2015

mlerner commented Mar 1, 2016

KidkArolis commented Aug 24, 2016