You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a Docker network driver that uses the iptables module. If the driver is restarted, it gets a thundering herd of requests, so multiple goroutines kick off and start, among other things, doing iptables calls. Here's what we see in the log in a few rare cases:
some of the iptables calls in this sequence eventually fail with errors similar to this:
iptables failed: iptables -t nat -A PREROUTING -i pdnet -p tcp --dport 53 -j REDIRECT --to-port 53: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?\n (exit status 4))
You can see that earlier calls (which fail) start without --wait flag, then there is a version check, then all following calls add the --wait flag. What sourcery is this?
My guess is all calls have to pass through initCheck(); the first one sets iptablesPath, and some of the ones behind the first see iptablesPath as set, so they bypass most of initCheck() and continue straight to invocation of iptables, but in fact the rest of the initCheck() function in the first goroutine is not done yet (in particular the long execs into testing for availability of --wait flag and determining iptables version). So the startup value of availability of --wait (false) is used for those early calls, which (because they are concurrent, and do not use --wait) leads to some of them failing.
The text was updated successfully, but these errors were encountered:
maxvt
changed the title
iptables initialization code has a race condition that can lead to some code failing
iptables initialization code has a race condition that can lead to some iptables calls failing
Mar 8, 2017
My guess is all calls have to pass through initCheck(); the first one sets iptablesPath, and some of the ones behind the first see iptablesPath as set, so they bypass most of initCheck()
The suggested fix for this issue is #1676.
We have a Docker network driver that uses the iptables module. If the driver is restarted, it gets a thundering herd of requests, so multiple goroutines kick off and start, among other things, doing iptables calls. Here's what we see in the log in a few rare cases:
time="2017-03-03T15:35:44Z" level=debug msg="/sbin/iptables, [-t nat -A PREROUTING -i pdnet -p udp --dport 8125 -j REDIRECT --to-port 8125]"
time="2017-03-03T15:35:44Z" level=debug msg="/sbin/iptables, [-t nat -A PREROUTING -i pdnet -p udp --dport 8125 -j REDIRECT --to-port 8125]"
time="2017-03-03T15:35:45Z" level=debug msg="/sbin/iptables, [--wait --version]"
time="2017-03-03T15:35:45Z" level=debug msg="/sbin/iptables, [--wait -t nat -C PREROUTING -i pdnet -p udp --dport 53 -j REDIRECT --to-port 53]"
time="2017-03-03T15:35:45Z" level=debug msg="/sbin/iptables, [--wait -t nat -C PREROUTING -i pdnet -p tcp --dport 53 -j REDIRECT --to-port 53]"
some of the iptables calls in this sequence eventually fail with errors similar to this:
iptables failed: iptables -t nat -A PREROUTING -i pdnet -p tcp --dport 53 -j REDIRECT --to-port 53: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?\n (exit status 4))
You can see that earlier calls (which fail) start without --wait flag, then there is a version check, then all following calls add the --wait flag. What sourcery is this?
My guess is all calls have to pass through initCheck(); the first one sets iptablesPath, and some of the ones behind the first see iptablesPath as set, so they bypass most of initCheck() and continue straight to invocation of iptables, but in fact the rest of the initCheck() function in the first goroutine is not done yet (in particular the long execs into testing for availability of --wait flag and determining iptables version). So the startup value of availability of --wait (false) is used for those early calls, which (because they are concurrent, and do not use --wait) leads to some of them failing.
The text was updated successfully, but these errors were encountered: