Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when HAproxy reloaded constantly #97

Closed
sielaq opened this issue Sep 4, 2015 · 5 comments
Closed

Race condition when HAproxy reloaded constantly #97

sielaq opened this issue Sep 4, 2015 · 5 comments
Labels

Comments

@sielaq
Copy link
Contributor

sielaq commented Sep 4, 2015

There is a race condition with haproxy_reload.sh. consul-template trigger it always when config change.
It could happen that it is triggered when previous did not finish. And next haproxy_reload.sh ends with
unknown state.

To Do:

  1. Reduce race condition with consul-template time -wait
  2. Fix remove()function to remove all appearance of previous config, even if this should not exists.
    start using OWN chains per HAproxy instance - so iptables rules will not be removed but the chain.
    ensure that ONLY one chain is active
  3. Make script survive that situation and deal with race conditions. It is better to make a proper clean up in unknown state, than exit and do nothing.
@Kosta-Github
Copy link
Contributor

Good point, I ran into this issue just yesterday. Symptoms were getting back the number of rules in the status() call like this: haproxy_a: 6, haproxy_b: 0.

6 is twice the number of the actual rules. Maybe a better check would be >= num_rules instead of equality?

@sielaq
Copy link
Contributor Author

sielaq commented Sep 5, 2015

I have analyzed that.

Idea is to create 3 chains: HAPROXY , HAPROXY_A, HAPROXY_B
1st chain gonna ensure that we will play in own sandbox.
2nd and 3rd gonna have pre configured per haproxy instance redirections - so we don't have to add/remove them.

configuration - done once :

iptables -t nat -N HAPROXY
iptables -t nat -N HAPROXY_A
iptables -t nat -N HAPROXY_B

1st assign:

iptables -t nat -A HAPROXY -j HAPROXY_A

So now, we can replace 1st rule (it is always 1st rule since it is our sandbox)

iptables -w -t nat -R HAPROXY 1 -j HAPROXY_B

Now I have to check if replace will NOT drop connections

sielaq added a commit to sielaq/PanteraS that referenced this issue Sep 6, 2015
sielaq added a commit that referenced this issue Sep 6, 2015
#97 fix iptables race condition
@sielaq
Copy link
Contributor Author

sielaq commented Sep 6, 2015

Local tests looks ok

# ab -n 100000 -c 100   http://mobile-service.service.consul/ping
This is ApacheBench, Version 2.3 <$Revision: 1528965 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking mobile-service.service.consul (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:        Apache-Coyote/1.1
Server Hostname:        mobile-service.service.consul
Server Port:            80

Document Path:          /ping
Document Length:        5 bytes

Concurrency Level:      100
Time taken for tests:   15.224 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      22100000 bytes
HTML transferred:       500000 bytes
Requests per second:    6568.75 [#/sec] (mean)
Time per request:       15.224 [ms] (mean)
Time per request:       0.152 [ms] (mean, across all concurrent requests)
Transfer rate:          1417.67 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.6      1      13
Processing:     2   14   2.8     14      41
Waiting:        2   14   2.8     13      40
Total:          3   15   2.9     15      41

Percentage of the requests served within a certain time (ms)
  50%     15
  66%     16
  75%     17
  80%     17
  90%     19
  95%     20
  98%     22
  99%     24
 100%     41 (longest request)

@sielaq
Copy link
Contributor Author

sielaq commented Sep 6, 2015

Iptables has its own mutex when it replace rules -R and other rule wait -w until it is unlock.
I have also tested unknown situation, by removing manually rule,
iptables -t nat -R HAPROXY 1 and "self healing" seems to works,
of course few milliseconds of outage can be experienced in "self healing" time,
but this is abnormal situation and better to have small outage than big outage that blocks forever.

I will test it on Monday in production load

@sielaq sielaq added the bug label Sep 8, 2015
@sielaq
Copy link
Contributor Author

sielaq commented Sep 9, 2015

looks good in prod
running stable for 3 days

@sielaq sielaq closed this as completed Sep 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants