Refine Endpoint selection for multi-cluster Service #4508

luolanzone · 2022-12-22T07:20:11Z

When the Endpoint of Multi-cluster Service is a local Service ClusterIP, refine the action to let it go to the corresponding exported Service's group to do final Endpoint selection. This can avoid the case that the traffic goes out of antrea-gw0 and goes back to OVS again when a local Pod is trying to access a MC Service but a local Service's Endpoint is selected.

Resolve #4431
Signed-off-by: Lan Luo luola@vmware.com

codecov · 2022-12-22T07:45:23Z

Codecov Report

Merging #4508 (858cb38) into main (be49b4a) will decrease coverage by 0.08%.
The diff coverage is 82.94%.

❗ Current head 858cb38 differs from pull request most recent head 8c3e22f. Consider uploading reports for the commit 8c3e22f to get more accurate results

@@            Coverage Diff             @@
##             main    #4508      +/-   ##
==========================================
- Coverage   69.83%   69.76%   -0.08%     
==========================================
  Files         401      415      +14     
  Lines       59529    58647     -882     
==========================================
- Hits        41575    40917     -658     
+ Misses      15142    14942     -200     
+ Partials     2812     2788      -24

Flag	Coverage Δ		*Carryforward flag
e2e-tests	`59.00% <ø> (+20.62%)`	⬆️
integration-tests	`34.48% <66.66%> (+0.04%)`	⬆️	Carriedforward from be83891
kind-e2e-tests	`47.64% <69.34%> (+0.36%)`	⬆️	Carriedforward from be83891
unit-tests	`59.84% <80.00%> (+0.08%)`	⬆️	Carriedforward from be83891

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files	Coverage Δ
cmd/antrea-agent/agent.go	`0.00% <0.00%> (ø)`
pkg/agent/cniserver/ipam/ipam_delegator.go	`68.18% <ø> (ø)`
pkg/agent/cniserver/ipam/ipam_service.go	`83.87% <ø> (-8.61%)`	⬇️
pkg/agent/cniserver/pod_configuration_linux.go	`100.00% <ø> (ø)`
pkg/agent/cniserver/pod_configuration_windows.go	`75.00% <ø> (ø)`
pkg/agent/cniserver/server_linux.go	`100.00% <ø> (ø)`
pkg/agent/cniserver/server_windows.go	`85.18% <ø> (ø)`
pkg/agent/cniserver/sriov.go	`31.37% <ø> (ø)`
pkg/agent/openflow/client.go	`88.63% <ø> (-0.03%)`	⬇️
pkg/agent/openflow/pipeline.go	`90.49% <ø> (+2.16%)`	⬆️
... and 90 more

pkg/agent/openflow/pipeline.go

wenyingd · 2022-12-23T06:44:46Z

pkg/agent/openflow/pipeline.go

-				LoadToRegField(EndpointPortField, uint32(portVal)).
-				ResubmitToTable(resubmitTableID).
-				Done()
+			if exportedSvcGroupID != 0 && f.isLocalServiceClusterIP(endpointIP) {


No need to check if the endpointIP is local cluster Service IP in openflow layer or not, we can check it proxier, and in openflow layer just leverage the expected groupID in the buckets as long as it is valid? proxier (client) should removce the endpoint from parameter []proxy.Endpoint if it is local Service IP.

luolanzone · 2023-01-06T07:17:09Z

/test-multicluster-e2e

luolanzone · 2023-01-06T07:51:36Z

Hi @jianjuns @wenyingd @hongliangl I added a new config field 'serviceCIDR' under multicluster config section in this PR so that antrea-agent can check if Endpoint is local Service's ClusterIP or not. Another way to figure out the serviceCIDR of a cluster is to do like antrea-mc-controller to get the CIDR automatically, but it requires extra RBAC to allow antrea-agent to have Service creation capability.
Personally, I would prefer to let antrea-agent to figure out Service CIDR automatically which is more user-friendly way. Let me know if you have any suggestion. Thanks.

jianjuns · 2023-01-06T19:23:31Z

Do we have a better way to handle this rather than checking against serviceCIDR? AntreaProxy already knows all local ClusterIPs right. @hongliangl

hongliangl · 2023-01-09T01:13:38Z

Do we have a better way to handle this rather than checking against serviceCIDR? AntreaProxy already knows all local ClusterIPs right. @hongliangl

Yes, Antrea Proxy already knows a CIDR that can contain all existing ClusterIPs, but it is located in pkg/agent/route, like the IPv4 Service CIDR

antrea/pkg/agent/route/route_linux.go

Line 122 in 114e14f

clusterIPv4CIDR *net.IPNet

.

I think the Service CIDR can be obtained by adding a new public method to this interface

antrea/pkg/agent/route/interfaces.go

Line 25 in 114e14f

type Interface interface {

, like GetServiceCIDR(isIPv6 bool) *net.IPNet.

luolanzone · 2023-01-09T02:46:45Z

@hongliangl I checked the codes, it seems Antrea proxy will only calculate the ServiceCIDR when proxyAll is enabled. Do you know how does AntreaProxy know what's the ServiceCIDR when proxyAll is disabled?

luolanzone · 2023-01-09T03:14:04Z

@jianjuns I double checked with hongliang about the current ServiceCIDR implementation on proxy. Antrea proxy will calculate ServiceCIDR only when proxyAll is enabled. So Hongliang's suggestion to expose ServiceCIDR from route pkg can work only when we enable proxyAll with multicluster. I feel this is not applicable for multi-cluster to get ServiceCIDR considering the proxyAll is disabled by default in most of cases.

@tnqn @wenyingd @hongliangl let us know if you know other alternative ways, thanks a lot!

hongliangl · 2023-01-09T03:20:21Z

Or we can calculate Service CIDR only in AntreaProxy, when mc is enable and when proxyAll is enabled, we calculate Service CIDR and install the route for the CIDR.

jianjuns · 2023-01-09T19:10:47Z

Yeah, I think it is good to keep a single way to discover Service CIDR. I am not sure what is the best design choice. Probably you guys can figure out.

luolanzone · 2023-01-12T11:27:12Z

After sync with @tnqn and @hongliangl we plan to extract the logic of serviceCIDR calculation out of AntreaProxy, and make it as standalone module, eg: servicecidr_discover.go. Both AntreaProxy and Multi-cluster can be a caller of new module. Hongliang will help to estimate the effort and submit a standalone PR. thanks.

wenyingd · 2023-02-06T09:15:35Z

pkg/agent/openflow/client.go

@@ -618,11 +618,11 @@ func (c *client) GetPodFlowKeys(interfaceName string) []string {
 	return c.getFlowKeysFromCache(c.featurePodConnectivity.podCachedFlows, interfaceName)
 }

-func (c *client) InstallServiceGroup(groupID binding.GroupIDType, withSessionAffinity bool, endpoints []proxy.Endpoint) error {
+func (c *client) InstallServiceGroup(groupID, exportedSvcGroupID binding.GroupIDType, withSessionAffinity bool, endpoints []proxy.Endpoint) error {


What is the expected behavior for a local Service deletion which is used as an Endpoint of the global Service?

The local Service will be deleted, then corresponding Endpoint in the global Service will also be deleted as well. When the Endpoints are changed, InstallServiceGroup should be called again without exportedSvcGroupID.

luolanzone · 2023-02-13T01:24:37Z

/test-multicluster-e2e

cmd/antrea-agent/agent.go

pkg/agent/openflow/client_test.go

pkg/agent/proxy/proxier.go

pkg/ovs/openflow/interfaces.go

pkg/agent/proxy/proxier.go

pkg/agent/proxy/topology.go

luolanzone · 2023-03-07T01:34:29Z

Seems the multicluster e2e take longer time then before and a few MCNP cases are failed randomly. I tried a failed case locally, it works fine. syncing with @hjiajing and @GraysonWu regarding the failed issue.

When the Endpoint of Multi-cluster Service is a local Service ClusterIP, refine the action to let it go to the corresponding exported Service's group to do final Endpoint selection. This can avoid the case that the traffic goes out of antrea-gw0 and goes back to OVS again when a local Pod is trying to access a MC Service but a local Service's Endpoint is selected. Signed-off-by: Lan Luo <luola@vmware.com>

luolanzone · 2023-03-07T07:09:18Z

/test-multicluster-e2e

luolanzone · 2023-03-07T07:39:19Z

The MC testbed is recovered and e2e test is passed.

luolanzone · 2023-03-07T07:40:05Z

/test-all

hongliangl

LGTM

tnqn

Sorry for the late review. I found conflict with my another PR and took a quick look, have some questions below.

tnqn · 2023-03-08T05:48:16Z

pkg/agent/proxy/proxier.go

+		if mcsLocalService != nil {
+			needUpdateEndpoints = true
+		}


This means the whole endpoints of this service will be updated as long as there is any service/endpoints change in the whole cluster and when the peridiocal sync happens (30s).

Why it must update endpoints when there is a local service endpoint?

tnqn · 2023-03-08T05:55:06Z

pkg/agent/proxy/topology.go

+	// For any Multi-cluster Service, its name will be a combination with prefix `antrea-mc-` and
+	// exported Service's name. So we need to remove the prefix to look up the exported Service.
+	if mcsLocalService != nil && strings.HasPrefix(svcPortName.Name, mcServiceNamePrefix) {
+		exportedSvcPortName := svcPortName
+		exportedSvcPortName.Name = strings.TrimPrefix(svcPortName.Name, mcServiceNamePrefix)
+		if _, ok := p.serviceMap[exportedSvcPortName]; ok {
+			mcsLocalService.GroupID = p.groupCounter.AllocateIfNotExist(exportedSvcPortName, false)
+			return mcsLocalService
+		}
+	}
+	return nil


this looks a little hacky, making AntreaProxy too specific to Multicluster. It would be hard to maintain in long term if we add such logic to it.
And I think the way doesn't always work as even it could allocate an ID for this service, it may not exist in OVS and installing flows with this group will fail.
I wonder if we could have more generic code to handle cluster IP as endpoint IP's case, not specific to Multicluster.

If flow installation will fail when the group does not exist, then we need to wait for another 30s for retry?

No good idea about decoupling AntreaProxy and Multicluster.

For decoupling AntreaProxy and Multicluster, I discussed this with @luolanzone offline. The proposal is:

Add a attribute IsNested to ServiceInfo to indicate whether the service may be DNATed more than once.

// ServiceInfo is the internal struct for caching service information. type ServiceInfo struct { *k8sproxy.BaseServiceInfo // cache for performance OFProtocol openflow.Protocol + // IsNested represents the Service's Endpoints could be another Service's ClusterIP. + IsNested bool } IsNested = service.Annotations[MulticlusterSpecificAnnotation]

For a nested Service, its Service flow will mark one reg bit of the packet indicate "nested":

InstallServiceFlows(groupID binding.GroupIDType, svcIP net.IP, svcPort uint16, protocol binding.Protocol, affinityTimeout uint16, nodeLocalExternal bool, + nested bool, svcType v1.ServiceType)

For a non nested Service, it will install one extra flow in EndpointDNAT table to resubmit packets with "nested" bit to its own group with higher priority.

The order of installing multicluster-service and normal service doesn't matter as their flows are individual. When multicluster-service's local service is not installed, the packet will just be DNATed to local service IP as normal packet
When multicluster-service's local service is installed, the packet will be resubmitted to the local service's group to select the eventual endpoint.

The flows would look like the below, assuming 10.96.0.1 as multicluster service IP and 10.96.0.2 as normal service IP:

table=ServiceLB, priority=200,tcp,nw_dst=10.96.0.1,tp_dst=80 actions=load:0x1->NESTED-BIT,group:1 table=ServiceLB, priority=200,tcp,nw_dst=10.96.0.2,tp_dst=80 actions=group:2 table=EndpointDNAT, priority=210,tcp,NESTED-BIT=0x1,reg3=0xa600002,reg4=0x20050/0x7ffff actions=group:2 table=EndpointDNAT, priority=200,tcp,reg3=0xa600002,reg4=0x20050/0x7ffff actions=ct(commit,table=AntreaPolicyEgressRule,zone=65520,nat(dst=10.96.0.2:80)

Whether to do all of these could be managed by a flag in AntreaProxy like supportedNestedService and it will be assigned the value of enableMulticlusterGateway, but the approach could potentially supports in-cluster nested Service in the future. What do you think?

…o#4508)" This reverts commit 6cdbca3.

When an Endpoint of a Multi-cluster Service is a local Service ClusterIP, add a new flow for the Service the EndpointDNAT with group action to let it go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>

Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>

…o#4508)" This reverts commit 6cdbca3. Signed-off-by: Lan Luo <luola@vmware.com>

Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>

…o#4508)" This reverts commit 6cdbca3. Signed-off-by: Lan Luo <luola@vmware.com>

Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>

* Revert "Refine Endpoint selection for multi-cluster Service (#4508)" This reverts commit 6cdbca3. Signed-off-by: Lan Luo <luola@vmware.com> * Refine Endpoint selection for MC Service Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: #4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com> --------- Signed-off-by: Lan Luo <luola@vmware.com>

When an Endpoint of a Multi-cluster Service is a local Service ClusterIP, change the flow action to let it go to the corresponding exported Service's group to select the endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. Signed-off-by: Lan Luo <luola@vmware.com>

* Revert "Refine Endpoint selection for multi-cluster Service (antrea-io#4508)" This reverts commit 6cdbca3. Signed-off-by: Lan Luo <luola@vmware.com> * Refine Endpoint selection for MC Service Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com> --------- Signed-off-by: Lan Luo <luola@vmware.com>

luolanzone added the area/multi-cluster Issues or PRs related to multi cluster. label Dec 22, 2022

luolanzone force-pushed the mc-local-ep branch from cb602bc to 34aa403 Compare December 22, 2022 07:44

hongliangl reviewed Dec 22, 2022

View reviewed changes

pkg/agent/openflow/pipeline.go Show resolved Hide resolved

pkg/agent/openflow/pipeline.go Outdated Show resolved Hide resolved

wenyingd reviewed Dec 23, 2022

View reviewed changes

luolanzone marked this pull request as draft December 23, 2022 12:51

luolanzone force-pushed the mc-local-ep branch from 16f33f8 to 1f56878 Compare January 3, 2023 08:25

luolanzone requested a review from jianjuns January 4, 2023 01:37

luolanzone mentioned this pull request Jan 4, 2023

Support more traffic modes for Multi-cluster Gateway #4407

Merged

luolanzone force-pushed the mc-local-ep branch from 1f56878 to bc54f09 Compare January 6, 2023 02:07

luolanzone force-pushed the mc-local-ep branch from bc54f09 to cdaf5b3 Compare January 6, 2023 10:01

luolanzone force-pushed the mc-local-ep branch from cdaf5b3 to 17ff8fb Compare January 12, 2023 11:06

luolanzone added this to the Antrea v1.11 release milestone Jan 18, 2023

wenyingd reviewed Feb 6, 2023

View reviewed changes

luolanzone force-pushed the mc-local-ep branch from 17ff8fb to 935f4f9 Compare February 9, 2023 14:27

luolanzone marked this pull request as ready for review February 13, 2023 01:23

luolanzone requested review from wenyingd and hongliangl February 13, 2023 01:23

luolanzone requested a review from tnqn February 13, 2023 01:24

hongliangl reviewed Feb 13, 2023

View reviewed changes

hongliangl reviewed Mar 7, 2023

View reviewed changes

pkg/agent/proxy/proxier.go Outdated Show resolved Hide resolved

pkg/agent/proxy/topology.go Outdated Show resolved Hide resolved

pkg/agent/proxy/topology.go Outdated Show resolved Hide resolved

pkg/agent/proxy/topology.go Outdated Show resolved Hide resolved

luolanzone force-pushed the mc-local-ep branch from ccc83ad to bd73c3d Compare March 7, 2023 02:33

hongliangl approved these changes Mar 7, 2023

View reviewed changes

jianjuns merged commit 6cdbca3 into antrea-io:main Mar 7, 2023

tnqn reviewed Mar 8, 2023

View reviewed changes

luolanzone deleted the mc-local-ep branch March 9, 2023 00:53

luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 9, 2023

Revert "Refine Endpoint selection for multi-cluster Service (antrea-i…

d0ecfb3

…o#4508)" This reverts commit 6cdbca3.

luolanzone mentioned this pull request Mar 9, 2023

Refine Endpoint selection for Multi-cluster Service #4693

Merged

luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 9, 2023

Revert "Refine Endpoint selection for multi-cluster Service (antrea-i…

c440d81

…o#4508)" This reverts commit 6cdbca3. Signed-off-by: Lan Luo <luola@vmware.com>

luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 13, 2023

Revert "Refine Endpoint selection for multi-cluster Service (antrea-i…

ff9d52d

…o#4508)" This reverts commit 6cdbca3. Signed-off-by: Lan Luo <luola@vmware.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine Endpoint selection for multi-cluster Service #4508

Refine Endpoint selection for multi-cluster Service #4508

luolanzone commented Dec 22, 2022 •

edited

Loading

codecov bot commented Dec 22, 2022 •

edited

Loading

wenyingd Dec 23, 2022 •

edited

Loading

luolanzone commented Jan 6, 2023

luolanzone commented Jan 6, 2023

jianjuns commented Jan 6, 2023

hongliangl commented Jan 9, 2023 •

edited

Loading

luolanzone commented Jan 9, 2023

luolanzone commented Jan 9, 2023

hongliangl commented Jan 9, 2023 •

edited

Loading

jianjuns commented Jan 9, 2023

luolanzone commented Jan 12, 2023

wenyingd Feb 6, 2023

luolanzone Feb 9, 2023

luolanzone commented Feb 13, 2023

luolanzone commented Mar 7, 2023

luolanzone commented Mar 7, 2023

luolanzone commented Mar 7, 2023

luolanzone commented Mar 7, 2023

hongliangl left a comment

tnqn left a comment

tnqn Mar 8, 2023

tnqn Mar 8, 2023

jianjuns Mar 8, 2023

tnqn Mar 9, 2023 •

edited

Loading

Refine Endpoint selection for multi-cluster Service #4508

Refine Endpoint selection for multi-cluster Service #4508

Conversation

luolanzone commented Dec 22, 2022 • edited Loading

codecov bot commented Dec 22, 2022 • edited Loading

Codecov Report

wenyingd Dec 23, 2022 • edited Loading

Choose a reason for hiding this comment

luolanzone commented Jan 6, 2023

luolanzone commented Jan 6, 2023

jianjuns commented Jan 6, 2023

hongliangl commented Jan 9, 2023 • edited Loading

luolanzone commented Jan 9, 2023

luolanzone commented Jan 9, 2023

hongliangl commented Jan 9, 2023 • edited Loading

jianjuns commented Jan 9, 2023

luolanzone commented Jan 12, 2023

wenyingd Feb 6, 2023

Choose a reason for hiding this comment

luolanzone Feb 9, 2023

Choose a reason for hiding this comment

luolanzone commented Feb 13, 2023

luolanzone commented Mar 7, 2023

luolanzone commented Mar 7, 2023

luolanzone commented Mar 7, 2023

luolanzone commented Mar 7, 2023

hongliangl left a comment

Choose a reason for hiding this comment

tnqn left a comment

Choose a reason for hiding this comment

tnqn Mar 8, 2023

Choose a reason for hiding this comment

tnqn Mar 8, 2023

Choose a reason for hiding this comment

jianjuns Mar 8, 2023

Choose a reason for hiding this comment

tnqn Mar 9, 2023 • edited Loading

Choose a reason for hiding this comment

luolanzone commented Dec 22, 2022 •

edited

Loading

codecov bot commented Dec 22, 2022 •

edited

Loading

wenyingd Dec 23, 2022 •

edited

Loading

hongliangl commented Jan 9, 2023 •

edited

Loading

hongliangl commented Jan 9, 2023 •

edited

Loading

tnqn Mar 9, 2023 •

edited

Loading