-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure full functionality of AntreaProxy with proxyAll enabled when kube-proxy presents #6308
Conversation
90b6c82
to
7b1ec7e
Compare
7b1ec7e
to
559dde5
Compare
ece78aa
to
e5a6949
Compare
e4b5ccb
to
e8b24b1
Compare
7a890bc
to
1b705a5
Compare
/test-all |
2254f1d
to
716b05a
Compare
99916ca
to
cad061b
Compare
cad061b
to
cda84bd
Compare
a61ff64
to
54e5429
Compare
pkg/agent/route/route_linux.go
Outdated
MatchCIDRDst(c.kubeServiceHost). | ||
MatchTransProtocol(iptables.ProtocolTCP). | ||
MatchPortDst(&c.kubeServicePort, nil). | ||
SetTarget(kubeProxyServiceChain). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes strong assumption on kube-proxy runs in iptables mode but it could be ipvs or nftables mode. I'm not sure if the current implementation can work with the other two modes, but there seems no reason why it should jump to the service chain directly here, could it just return from this chain?
Another possible approach is, handle traffic to external ips and nodeports only in this patch, given there seems no obvious benefit to redirect clusterIP traffic to OVS at the moment, but it takes some code to specially handle kubernetes service's cluster IP. In theory, there should be no "external to ClusterIP" traffic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possible approach is, handle traffic to external ips and nodeports only in this patch, given there seems no obvious benefit to redirect clusterIP traffic to OVS at the moment, but it takes some code to specially handle kubernetes service's cluster IP. In theory, there should be no "external to ClusterIP" traffic.
I'm fine with this, and the code will be simpler. As for ClusterIP, maybe we could handle that in the future if there are users that need external to ClusterIP
.
6f666b7
to
bcd4544
Compare
bcd4544
to
61a12f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes, this is now easier to review
pkg/agent/route/route_linux.go
Outdated
@@ -829,6 +929,12 @@ func (c *Client) restoreIptablesData(podCIDR *net.IPNet, | |||
"-j", iptables.DNATTarget, | |||
"--to-destination", nodePortDNATVirtualIP.String(), | |||
}...) | |||
writeLine(iptablesData, []string{ | |||
"-A", antreaPreRoutingChain, | |||
"-m", "comment", "--comment", `"Antrea: accept external to external IP packets"`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is nat table, why accepting packets here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this. According to the document :
NAT
This is the realm of the `nat' table, which is fed packets from two netfilter hooks: for non-local packets, the NF_IP_PRE_ROUTING and NF_IP_POST_ROUTING hooks are perfect for destination and source alterations respectively. If CONFIG_IP_NF_NAT_LOCAL is defined, the hooks NF_IP_LOCAL_OUT and NF_IP_LOCAL_IN are used for altering the destination of local packets.
This table is slightly different from the `filter' table, in that only the first packet of a new connection will traverse the table: the result of this traversal is then applied to all future packets in the same connection.
The IPs matched by the ipset has been enforced with target NOTRACK
, and the corresponding packets will be processed in nat table.
test/e2e/proxy_test.go
Outdated
@@ -744,6 +744,7 @@ func testProxyHairpin(t *testing.T, isIPv6 bool) { | |||
agnhostHost := fmt.Sprintf("agnhost-host-%v", isIPv6) | |||
createAgnhostPod(t, data, agnhostHost, node, true) | |||
t.Run("HostNetwork Endpoints", func(t *testing.T) { | |||
skipIfKubeProxyEnabled(t, data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this test is ran when kube-proxy is removed and Antrea Proxy handles ClusterIP traffic. Now, as the kube-proxy presents and handles ClusterIP traffic, the result of the test won't be the expected result.
docs/antrea-proxy.md
Outdated
handle all Service traffic. | ||
|
||
Starting with Antrea v2.1, when only `proxyAll` is enabled, AntreaProxy is capable | ||
of handing all kinds of Service traffic except for ClusterIP, even if kube-proxy is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except for -> ", not just ClusterIP Service traffic"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be except
. LoadBalancer, NodePort and externalIP traffic except ClusterIP.
1041cd6
to
5d5faf7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, some nits
test/e2e/proxy_test.go
Outdated
@@ -745,6 +745,9 @@ func testProxyHairpin(t *testing.T, isIPv6 bool) { | |||
createAgnhostPod(t, data, agnhostHost, node, true) | |||
t.Run("HostNetwork Endpoints", func(t *testing.T) { | |||
skipIfProxyAllDisabled(t, data) | |||
// AntreProxy with proxyAll enabled doesn't handle ClusterIP traffic sourced from local Node when kube-proxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// AntreProxy with proxyAll enabled doesn't handle ClusterIP traffic sourced from local Node when kube-proxy | |
// AntreaProxy with proxyAll enabled doesn't handle ClusterIP traffic sourced from local Node when kube-proxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm, when proxyAll is enabled and kube-proxy is running, such hairpin case can still work, it's just the client IP is not the virtual IP, right? If yes, I think the test should just change the expected IP, instead of not validating this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it still can work, but the expected IP is not the virtual IP. I will update the test.
5d5faf7
to
ba9648b
Compare
docs/antrea-proxy.md
Outdated
Starting with Antrea v2.1, when `proxyAll` is enabled, AntreaProxy is capable | ||
of handling all kinds of Service traffic except ClusterIP, even if kube-proxy in | ||
iptables mode is present. This is accomplished by prioritizing the rules installed | ||
by Antrea Proxy redirecting Service traffic to OVS over those installed by kube-proxy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this:
Starting with Antrea v2.1, when `proxyAll` is enabled, AntreaProxy will handle
Service traffic destined to NodePort, LoadBalancerIP and ExternalIP, even if
kube-proxy is present. This benefits users who want to take advantage of
AntreaProxy's advanced features, such as Direct Server Return (DSR) mode, but
lack control over kube-proxy's installation. This is accomplished by
prioritizing the rules installed by AntreaProxy over those installed by
kube-proxy, thus it works only with kube-proxy iptables mode. Support for other
kube-proxy modes may be added in the future.
cc @jianjuns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the document.
1f3f1ce
to
8f6410e
Compare
…ube-proxy presents To ensure full functionality of AntreaProxy, except for handling ClusterIP from Nodes, even when kube-proxy in iptables mode is present, certain key changes are implemented when proxyAll is enabled: The jump rules for the chains managed by Antrea, `ANTREA-PREROUTING` and `ANTREA-OUTPUT` in nat table, are installed by inserting instead of appending to bypass the chain `KUBE-SERVICES` performing Service DNAT managed by kube-proxy. Antrea ensures that the jump rules take precedence over those managed by kube-proxy. The iptables rules of nat table chain `ANTREA-PREROUTING` are like below, and they are similar in chain `ANTREA-OUTPUT`. ``` -A ANTREA-PREROUTING -m comment --comment "Antrea: DNAT external to NodePort packets" -m set --match-set ANTREA-NODEPORT-IP dst,dst -j DNAT --to-destination 169.254.0.252 ``` The rule is to DNAT NodePort traffic, bypassing chain `KUBE-SERVICES`. The iptables rules of raw table chains ANTREA-PREROUTING / ANTREA-OUTPUT are like below: ``` 1. -A ANTREA-PREROUTING -m comment --comment "Antrea: do not track incoming encapsulation packets" -m udp -p udp --dport 6081 -m addrtype --dst-type LOCAL -j NOTRACK 2. -A ANTREA-PREROUTING -m comment --comment "Antrea: drop Pod multicast traffic forwarded via underlay network" -m set --match-set CLUSTER-NODE-IP src -d 224.0.0.0/4 -j DROP 3. -A ANTREA-PREROUTING -m comment --comment "Antrea: do not track request packets destined to external IPs" -m set --match-set ANTREA-EXTERNAL-IP dst -j NOTRACK 4. -A ANTREA-PREROUTING -m comment --comment "Antrea: do not track reply packets sourced from external IPs" -m set --match-set ANTREA-EXTERNAL-IP src -j NOTRACK 5. -A ANTREA-OUTPUT -m comment --comment "Antrea: do not track request packets destined to external IPs" -m set --match-set ANTREA-EXTERNAL-IP dst -j NOTRACK ``` - Rules 1-2 are not new rules. - Rule 3 is to bypass conntrack for packets sourced from external and destined to externalIPs, which also results in bypassing the chains managed by Antrea Proxy and kube-proxy in nat table. - Rule 4 is to bypass conntrack for packets sourced from externalIPs, which also results in bypassing the chains managed by Antrea Proxy and kube-proxy in nat table. - Rule 5 is to bypass conntrack for packets sourced from local and destined to externalIPs, which also results in bypassing the chains managed by Antrea Proxy and kube-proxy in nat table. The following are the benchmark results of a LoadBalancer Service configured with DSR mode. The results of TCP_STREAM and TCP_RR (single TCP connection) are almost the same as that before. The result of TCP_CRR (multiple TCP connections) performs better than before. One reason should be that conntrack is skipped for LoadBalancer Services. ``` Test v2.0 proxyAll Dev proxyAll Delta TCP_STREAM 4933.97 4918.35 -0.32% TCP_RR 8095.49 8032.4 -0.78% TCP_CRR 1645.66 1888.93 +14.79% ``` Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
8f6410e
to
48b8f7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the efforts!
one nit: since you updated all AntreaProxy to Antrea Proxy in another PR, better to update here in antrea-proxy.md as well to avoid any dependency and another round of updates.
I can update document antrea-proxy.md in that PR. |
/test-kind-ipv6-only-all |
Depends on #6381
Resolve #6232
To ensure full functionality of AntreaProxy, except for handling ClusterIP from Nodes,
even when kube-proxy in iptables mode is present, certain key changes are implemented
when proxyAll is enabled:
The jump rules for the chains managed by Antrea,
ANTREA-PREROUTING
andANTREA-OUTPUT
in nat table, are installed by inserting instead of appending to bypass the chain
KUBE-SERVICES
performing Service DNAT managed by kube-proxy. Antrea ensures thatthe jump rules take precedence over those managed by kube-proxy.
The iptables rules of nat table chain
ANTREA-PREROUTING
are like below, and they aresimilar in chain
ANTREA-OUTPUT
.The rule is to DNAT NodePort traffic, bypassing chain
KUBE-SERVICES
.The iptables rules of raw table chains ANTREA-PREROUTING / ANTREA-OUTPUT are like
below:
externalIPs, which also results in bypassing the chains managed by Antrea Proxy
and kube-proxy in nat table.
results in bypassing the chains managed by Antrea Proxy and kube-proxy in nat
table.
externalIPs, which also results in bypassing the chains managed by Antrea Proxy
and kube-proxy in nat table.
The following are the benchmark results of a LoadBalancer Service configured with DSR mode.
The results of TCP_STREAM and TCP_RR (single TCP connection) are almost the same as that
before. The result of TCP_CRR (multiple TCP connections) performs better than before. One
reason should be that conntrack is skipped for LoadBalancer Services.
TODO:
ANTREA-PREROUTING
andANTREA-OUTPUT
are in the first rule of corresponding default iptables chains to skip kube-proxy chains.