Skip to content

Rework NPC's iptables-restore Logic To Use --noflush #1372

Open
@aauren

Description

@aauren

Is your feature request related to a problem? Please describe.
As pointed out in #1370 when we operate iptables-restore in it's default mode we have the potential to cause race-conditions with other iptables tooling that may exist on the host. In its default operation iptables-restore flushes all existing rules and reads in rules from scratch from the restore file given to it.

Using iptables in this way means that kube-router has the potentially to inadvertently change or remove other application's chains and rules. However, if we use iptables-restore --noflush we can choose to, for the most part, only operate on our own rules and chains. Any chain / rule that is not in the iptables restore input is left alone.

This is especially important because upstream netfilter does not have strong promises of backward compatibility with previous versions of user-space tooling. These two issues together essentially caused #1370 to occur because kube-proxy used a newer version of the user-space tools than kube-router did, when kube-router used iptables-save it wasn't actually fed all of the appropriate options to the kube-proxy rules. Thus when it did it's overly-broad iptables-restore operation, it changed rules that were not related to kube-router

Describe the solution you'd like
kube-router should rework the logic inside the NPC controller so that it uses iptables-restore --noflush and tries as hard as possible to only touch chains and rules that it is authoritative for. As suggested by the kube-proxy team, (thanks to @danwinship) the most efficient way to get from what we have now to what we need would likely be logic along the lines of:

Basically, when you use --noflush, there are three possibilities for a chain:

  • if you have no :CHAINNAME line for the chain, and no -X CHAINNAME or -A CHAINNAME rules, then the restore doesn't affect that chain at all
  • if you have :CHAINNAME and then later -X CHAINNAME (and no -A CHAINNAME), then the chain gets deleted
  • if you have :CHAINNAME (and no -X CHAINNAME), and 0 or more -A CHAINNAME ... lines, then the chain gets completely replaced by the rules in the restore input (just like what would have happened in the non---noflush case)

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureoverride-staleDon't allow automatic management of stale issues / PRs

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions