Local traffic for multi-cluster service should be handled within OVS #4431

luolanzone · 2022-12-01T11:28:18Z

Describe the problem/challenge you have

The Antrea multi-cluster enables user to access exported Service from various member clusters in a ClusterSet. A local Service's ClusterIP will be included in the multi-cluster Service's Endpoint as long as user exports a Service from the same member cluster.

When the local Service ClusterIP becomes an Endpoint of multi-cluster Service, and a Pod which is trying to access multi-cluster Service in a ClusterSet, it may talk to the Service in the same cluster. From Service access perspective, it works fine. However, this kind of traffic will do first DNAT in OVS to get the local Service's ClusterIP as destination, then go through antrea-gw0 to uplink. It may do the second DNAT to find the true Pod Endpoint via iptables rule(if kube-proxy is enabled), or it will go back to OVS to do second DNAT to get the true Pod IP as destination(if kube-proxy is disabled and replaced by AntreaProxy). Considering the in-cluster traffic never goes out of OVS and it will always go through tunnel interface in the encap mode. We should guarantee that Antrea multi-cluster traffic also keep the same traffic path no matter it's local ClusterIP or remote ClusterIP as Endpoint.

Another known impact of this issue is the tunnel_id we will use in stretched NetworkPolicy will be lost if the traffic goes out of OVS.

Describe the solution you'd like

The second DNAT should be done inside of OVS instead of going through antrea-gw0. The solution is not decided yet, document this issue for further discussion.

The impact of this issue for stretched NetworkPolicy maybe fixed by a change from controller side. @Dyanngg may have the details.

Anything else you would like to add?

The text was updated successfully, but these errors were encountered:

luolanzone · 2022-12-21T08:26:47Z

After a few investigation and verification, I think we may add a new table named GlobalServiceLB table to make the final Endpoint selection before the traffic hits EndpointDNAT table, which can help to make two DNAT into one DNAT without going out of antrea-gw0 when the selected Endpoint is local Service ClusterIP.

Let's say we have a local Service nginx which is exported, and it has a ClusterIP 10.96.30.99. And there is also a multi-cluster Service named antrea-mc-nginx being created with ClusterIP 10.96.30.253.

When the packet goes to table ServiceLB and match the IP 10.96.30.253, it will go to corresponding Service Group to choose the Endpoint. For our multi-cluster Service with local ClusterIP, it will select local ClusterIP 10.96.30.99 and go to EndpointDNAT table. In our new scenario, the next table after Endpoint selection in ServiceLB flow will be GlobalServiceLB instead of EndpointDNAT. For any exported Service, we will use reg3 to match target local ClusterIP (here is 10.96.30.99) in the new table GlobalServiceLB, and do another Endpoint selection, then it will get the local Pod's Endpoint IP. Once the final Endpoint selection is done, it will perform like local Pod access.

# No change for ServiceLB flow
table=ServiceLB, priority=200,tcp,reg4=0x10000/0x70000,nw_dst=10.96.30.253,tp_dst=80 actions=load:0x2->NXM_NX_REG4[16..18],load:0x1->NXM_NX_REG0[9],load:0x2->NXM_NX_REG7[],group:2

# Update the next table from EndpointDNAT to GlobalServiceLB for Service Group
group_id=2,type=select,bucket=bucket_id:0,weight:100,actions=load:0xa601e63->NXM_NX_REG3[],load:0x50->NXM_NX_REG4[0..15],resubmit(,GlobalServiceLB)

# Add a new flow on GlobalServiceLB for local exported Service ClusterIP and do another Endpoint selection.
table=GlobalServiceLB, priority=200,tcp,reg3=0xa601e63,tp_dst=80 actions=load:0x2->NXM_NX_REG4[16..18],load:0x1->NXM_NX_REG0[9],group:1
# A default flow if Service Endpoint selection is already done in ServiceLB.
table=GlobalServiceLB, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,EndpointDNAT)

@tnqn @jianjuns @wenyingd @hongliangl could you help to take a look at above new table and flows? It can work as expected, but I am not sure if there is any risk to impact existing flows. Welcome any comment or suggestion. thanks.

luolanzone · 2022-12-21T11:09:20Z

After sync with @wenyingd , there is another option which might work as well. We can add a bucket with action to resubmit the packet to final group to do Endpoint selection. I will verify this change locally first.

group_id=2,type=select,bucket=bucket_id:0,weight:100,actions=group:1

luolanzone added kind/feature Categorizes issue or PR as related to a new feature. area/multi-cluster Issues or PRs related to multi cluster. labels Dec 1, 2022

Dyanngg mentioned this issue Dec 1, 2022

Add addressgroup peer for in-cluster stretched NetworkPolicy enforcement #4432

Merged

luolanzone mentioned this issue Dec 20, 2022

Support more traffic modes for Multi-cluster Gateway #4407

Merged

luolanzone mentioned this issue Dec 22, 2022

Refine Endpoint selection for multi-cluster Service #4508

Merged

jianjuns closed this as completed in #4508 Mar 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local traffic for multi-cluster service should be handled within OVS #4431

Local traffic for multi-cluster service should be handled within OVS #4431

luolanzone commented Dec 1, 2022

luolanzone commented Dec 21, 2022 •

edited

Loading

luolanzone commented Dec 21, 2022

Local traffic for multi-cluster service should be handled within OVS #4431

Local traffic for multi-cluster service should be handled within OVS #4431

Comments

luolanzone commented Dec 1, 2022

luolanzone commented Dec 21, 2022 • edited Loading

luolanzone commented Dec 21, 2022

luolanzone commented Dec 21, 2022 •

edited

Loading