Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local traffic for multi-cluster service should be handled within OVS #4431

Closed
luolanzone opened this issue Dec 1, 2022 · 2 comments · Fixed by #4508
Closed

Local traffic for multi-cluster service should be handled within OVS #4431

luolanzone opened this issue Dec 1, 2022 · 2 comments · Fixed by #4508
Labels
area/multi-cluster Issues or PRs related to multi cluster. kind/feature Categorizes issue or PR as related to a new feature.

Comments

@luolanzone
Copy link
Contributor

Describe the problem/challenge you have

The Antrea multi-cluster enables user to access exported Service from various member clusters in a ClusterSet. A local Service's ClusterIP will be included in the multi-cluster Service's Endpoint as long as user exports a Service from the same member cluster.

When the local Service ClusterIP becomes an Endpoint of multi-cluster Service, and a Pod which is trying to access multi-cluster Service in a ClusterSet, it may talk to the Service in the same cluster. From Service access perspective, it works fine. However, this kind of traffic will do first DNAT in OVS to get the local Service's ClusterIP as destination, then go through antrea-gw0 to uplink. It may do the second DNAT to find the true Pod Endpoint via iptables rule(if kube-proxy is enabled), or it will go back to OVS to do second DNAT to get the true Pod IP as destination(if kube-proxy is disabled and replaced by AntreaProxy). Considering the in-cluster traffic never goes out of OVS and it will always go through tunnel interface in the encap mode. We should guarantee that Antrea multi-cluster traffic also keep the same traffic path no matter it's local ClusterIP or remote ClusterIP as Endpoint.

Another known impact of this issue is the tunnel_id we will use in stretched NetworkPolicy will be lost if the traffic goes out of OVS.

Describe the solution you'd like

The second DNAT should be done inside of OVS instead of going through antrea-gw0. The solution is not decided yet, document this issue for further discussion.

The impact of this issue for stretched NetworkPolicy maybe fixed by a change from controller side. @Dyanngg may have the details.

Anything else you would like to add?

@luolanzone luolanzone added kind/feature Categorizes issue or PR as related to a new feature. area/multi-cluster Issues or PRs related to multi cluster. labels Dec 1, 2022
@luolanzone
Copy link
Contributor Author

luolanzone commented Dec 21, 2022

After a few investigation and verification, I think we may add a new table named GlobalServiceLB table to make the final Endpoint selection before the traffic hits EndpointDNAT table, which can help to make two DNAT into one DNAT without going out of antrea-gw0 when the selected Endpoint is local Service ClusterIP.

Let's say we have a local Service nginx which is exported, and it has a ClusterIP 10.96.30.99. And there is also a multi-cluster Service named antrea-mc-nginx being created with ClusterIP 10.96.30.253.

When the packet goes to table ServiceLB and match the IP 10.96.30.253, it will go to corresponding Service Group to choose the Endpoint. For our multi-cluster Service with local ClusterIP, it will select local ClusterIP 10.96.30.99 and go to EndpointDNAT table. In our new scenario, the next table after Endpoint selection in ServiceLB flow will be GlobalServiceLB instead of EndpointDNAT. For any exported Service, we will use reg3 to match target local ClusterIP (here is 10.96.30.99) in the new table GlobalServiceLB, and do another Endpoint selection, then it will get the local Pod's Endpoint IP. Once the final Endpoint selection is done, it will perform like local Pod access.

# No change for ServiceLB flow
table=ServiceLB, priority=200,tcp,reg4=0x10000/0x70000,nw_dst=10.96.30.253,tp_dst=80 actions=load:0x2->NXM_NX_REG4[16..18],load:0x1->NXM_NX_REG0[9],load:0x2->NXM_NX_REG7[],group:2
# Update the next table from EndpointDNAT to GlobalServiceLB for Service Group
group_id=2,type=select,bucket=bucket_id:0,weight:100,actions=load:0xa601e63->NXM_NX_REG3[],load:0x50->NXM_NX_REG4[0..15],resubmit(,GlobalServiceLB)
# Add a new flow on GlobalServiceLB for local exported Service ClusterIP and do another Endpoint selection.
table=GlobalServiceLB, priority=200,tcp,reg3=0xa601e63,tp_dst=80 actions=load:0x2->NXM_NX_REG4[16..18],load:0x1->NXM_NX_REG0[9],group:1
# A default flow if Service Endpoint selection is already done in ServiceLB.
table=GlobalServiceLB, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,EndpointDNAT)

@tnqn @jianjuns @wenyingd @hongliangl could you help to take a look at above new table and flows? It can work as expected, but I am not sure if there is any risk to impact existing flows. Welcome any comment or suggestion. thanks.

@luolanzone
Copy link
Contributor Author

After sync with @wenyingd , there is another option which might work as well. We can add a bucket with action to resubmit the packet to final group to do Endpoint selection. I will verify this change locally first.

group_id=2,type=select,bucket=bucket_id:0,weight:100,actions=group:1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/multi-cluster Issues or PRs related to multi cluster. kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant