Description
Is your feature request related to a problem? Please describe.
As pointed out in #1370 when we operate iptables-restore
in it's default mode we have the potential to cause race-conditions with other iptables tooling that may exist on the host. In its default operation iptables-restore
flushes all existing rules and reads in rules from scratch from the restore file given to it.
Using iptables in this way means that kube-router has the potentially to inadvertently change or remove other application's chains and rules. However, if we use iptables-restore --noflush
we can choose to, for the most part, only operate on our own rules and chains. Any chain / rule that is not in the iptables restore input is left alone.
This is especially important because upstream netfilter does not have strong promises of backward compatibility with previous versions of user-space tooling. These two issues together essentially caused #1370 to occur because kube-proxy used a newer version of the user-space tools than kube-router did, when kube-router used iptables-save
it wasn't actually fed all of the appropriate options to the kube-proxy rules. Thus when it did it's overly-broad iptables-restore
operation, it changed rules that were not related to kube-router
Describe the solution you'd like
kube-router should rework the logic inside the NPC controller so that it uses iptables-restore --noflush
and tries as hard as possible to only touch chains and rules that it is authoritative for. As suggested by the kube-proxy team, (thanks to @danwinship) the most efficient way to get from what we have now to what we need would likely be logic along the lines of:
Basically, when you use --noflush, there are three possibilities for a chain:
- if you have no :CHAINNAME line for the chain, and no -X CHAINNAME or -A CHAINNAME rules, then the restore doesn't affect that chain at all
- if you have :CHAINNAME and then later -X CHAINNAME (and no -A CHAINNAME), then the chain gets deleted
- if you have :CHAINNAME (and no -X CHAINNAME), and 0 or more -A CHAINNAME ... lines, then the chain gets completely replaced by the rules in the restore input (just like what would have happened in the non---noflush case)
Additional context