Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine Endpoint selection for multi-cluster Service #4508

Merged
merged 1 commit into from
Mar 7, 2023

Conversation

luolanzone
Copy link
Contributor

@luolanzone luolanzone commented Dec 22, 2022

When the Endpoint of Multi-cluster Service is a local Service ClusterIP, refine the action to let it go to the corresponding exported Service's group to do final Endpoint selection. This can avoid the case that the traffic goes out of antrea-gw0 and goes back to OVS again when a local Pod is trying to access a MC Service but a local Service's Endpoint is selected.

Resolve #4431
Signed-off-by: Lan Luo luola@vmware.com

@luolanzone luolanzone added the area/multi-cluster Issues or PRs related to multi cluster. label Dec 22, 2022
@codecov
Copy link

codecov bot commented Dec 22, 2022

Codecov Report

Merging #4508 (858cb38) into main (be49b4a) will decrease coverage by 0.08%.
The diff coverage is 82.94%.

❗ Current head 858cb38 differs from pull request most recent head 8c3e22f. Consider uploading reports for the commit 8c3e22f to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4508      +/-   ##
==========================================
- Coverage   69.83%   69.76%   -0.08%     
==========================================
  Files         401      415      +14     
  Lines       59529    58647     -882     
==========================================
- Hits        41575    40917     -658     
+ Misses      15142    14942     -200     
+ Partials     2812     2788      -24     
Flag Coverage Δ *Carryforward flag
e2e-tests 59.00% <ø> (+20.62%) ⬆️
integration-tests 34.48% <66.66%> (+0.04%) ⬆️ Carriedforward from be83891
kind-e2e-tests 47.64% <69.34%> (+0.36%) ⬆️ Carriedforward from be83891
unit-tests 59.84% <80.00%> (+0.08%) ⬆️ Carriedforward from be83891

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files Coverage Δ
cmd/antrea-agent/agent.go 0.00% <0.00%> (ø)
pkg/agent/cniserver/ipam/ipam_delegator.go 68.18% <ø> (ø)
pkg/agent/cniserver/ipam/ipam_service.go 83.87% <ø> (-8.61%) ⬇️
pkg/agent/cniserver/pod_configuration_linux.go 100.00% <ø> (ø)
pkg/agent/cniserver/pod_configuration_windows.go 75.00% <ø> (ø)
pkg/agent/cniserver/server_linux.go 100.00% <ø> (ø)
pkg/agent/cniserver/server_windows.go 85.18% <ø> (ø)
pkg/agent/cniserver/sriov.go 31.37% <ø> (ø)
pkg/agent/openflow/client.go 88.63% <ø> (-0.03%) ⬇️
pkg/agent/openflow/pipeline.go 90.49% <ø> (+2.16%) ⬆️
... and 90 more

LoadToRegField(EndpointPortField, uint32(portVal)).
ResubmitToTable(resubmitTableID).
Done()
if exportedSvcGroupID != 0 && f.isLocalServiceClusterIP(endpointIP) {
Copy link
Contributor

@wenyingd wenyingd Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check if the endpointIP is local cluster Service IP in openflow layer or not, we can check it proxier, and in openflow layer just leverage the expected groupID in the buckets as long as it is valid? proxier (client) should removce the endpoint from parameter []proxy.Endpoint if it is local Service IP.

@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

@luolanzone
Copy link
Contributor Author

Hi @jianjuns @wenyingd @hongliangl I added a new config field 'serviceCIDR' under multicluster config section in this PR so that antrea-agent can check if Endpoint is local Service's ClusterIP or not. Another way to figure out the serviceCIDR of a cluster is to do like antrea-mc-controller to get the CIDR automatically, but it requires extra RBAC to allow antrea-agent to have Service creation capability.
Personally, I would prefer to let antrea-agent to figure out Service CIDR automatically which is more user-friendly way. Let me know if you have any suggestion. Thanks.

@jianjuns
Copy link
Contributor

jianjuns commented Jan 6, 2023

Do we have a better way to handle this rather than checking against serviceCIDR? AntreaProxy already knows all local ClusterIPs right. @hongliangl

@hongliangl
Copy link
Contributor

hongliangl commented Jan 9, 2023

Do we have a better way to handle this rather than checking against serviceCIDR? AntreaProxy already knows all local ClusterIPs right. @hongliangl

Yes, Antrea Proxy already knows a CIDR that can contain all existing ClusterIPs, but it is located in pkg/agent/route, like the IPv4 Service CIDR

clusterIPv4CIDR *net.IPNet
.

I think the Service CIDR can be obtained by adding a new public method to this interface

type Interface interface {
, like GetServiceCIDR(isIPv6 bool) *net.IPNet.

@luolanzone
Copy link
Contributor Author

@hongliangl I checked the codes, it seems Antrea proxy will only calculate the ServiceCIDR when proxyAll is enabled. Do you know how does AntreaProxy know what's the ServiceCIDR when proxyAll is disabled?

@luolanzone
Copy link
Contributor Author

@jianjuns I double checked with hongliang about the current ServiceCIDR implementation on proxy. Antrea proxy will calculate ServiceCIDR only when proxyAll is enabled. So Hongliang's suggestion to expose ServiceCIDR from route pkg can work only when we enable proxyAll with multicluster. I feel this is not applicable for multi-cluster to get ServiceCIDR considering the proxyAll is disabled by default in most of cases.

@tnqn @wenyingd @hongliangl let us know if you know other alternative ways, thanks a lot!

@hongliangl
Copy link
Contributor

hongliangl commented Jan 9, 2023

Or we can calculate Service CIDR only in AntreaProxy, when mc is enable and when proxyAll is enabled, we calculate Service CIDR and install the route for the CIDR.

@jianjuns
Copy link
Contributor

jianjuns commented Jan 9, 2023

Yeah, I think it is good to keep a single way to discover Service CIDR. I am not sure what is the best design choice. Probably you guys can figure out.

@luolanzone
Copy link
Contributor Author

After sync with @tnqn and @hongliangl we plan to extract the logic of serviceCIDR calculation out of AntreaProxy, and make it as standalone module, eg: servicecidr_discover.go. Both AntreaProxy and Multi-cluster can be a caller of new module. Hongliang will help to estimate the effort and submit a standalone PR. thanks.

@luolanzone luolanzone added this to the Antrea v1.11 release milestone Jan 18, 2023
@@ -618,11 +618,11 @@ func (c *client) GetPodFlowKeys(interfaceName string) []string {
return c.getFlowKeysFromCache(c.featurePodConnectivity.podCachedFlows, interfaceName)
}

func (c *client) InstallServiceGroup(groupID binding.GroupIDType, withSessionAffinity bool, endpoints []proxy.Endpoint) error {
func (c *client) InstallServiceGroup(groupID, exportedSvcGroupID binding.GroupIDType, withSessionAffinity bool, endpoints []proxy.Endpoint) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expected behavior for a local Service deletion which is used as an Endpoint of the global Service?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local Service will be deleted, then corresponding Endpoint in the global Service will also be deleted as well. When the Endpoints are changed, InstallServiceGroup should be called again without exportedSvcGroupID.

@luolanzone luolanzone marked this pull request as ready for review February 13, 2023 01:23
@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

@luolanzone luolanzone requested a review from tnqn February 13, 2023 01:24
@luolanzone
Copy link
Contributor Author

Seems the multicluster e2e take longer time then before and a few MCNP cases are failed randomly. I tried a failed case locally, it works fine. syncing with @hjiajing and @GraysonWu regarding the failed issue.

When the Endpoint of Multi-cluster Service is a local Service ClusterIP,
refine the action to let it go to the corresponding exported Service's
group to do final Endpoint selection. This can avoid the case that the
traffic goes out of antrea-gw0 and goes back to OVS again when a
local Pod is trying to access a MC Service but a local Service's
Endpoint is selected.

Signed-off-by: Lan Luo <luola@vmware.com>
@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

@luolanzone
Copy link
Contributor Author

The MC testbed is recovered and e2e test is passed.

@luolanzone
Copy link
Contributor Author

/test-all

Copy link
Contributor

@hongliangl hongliangl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jianjuns jianjuns merged commit 6cdbca3 into antrea-io:main Mar 7, 2023
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. I found conflict with my another PR and took a quick look, have some questions below.

Comment on lines +426 to +428
if mcsLocalService != nil {
needUpdateEndpoints = true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means the whole endpoints of this service will be updated as long as there is any service/endpoints change in the whole cluster and when the peridiocal sync happens (30s).

Why it must update endpoints when there is a local service endpoint?

Comment on lines +192 to +202
// For any Multi-cluster Service, its name will be a combination with prefix `antrea-mc-` and
// exported Service's name. So we need to remove the prefix to look up the exported Service.
if mcsLocalService != nil && strings.HasPrefix(svcPortName.Name, mcServiceNamePrefix) {
exportedSvcPortName := svcPortName
exportedSvcPortName.Name = strings.TrimPrefix(svcPortName.Name, mcServiceNamePrefix)
if _, ok := p.serviceMap[exportedSvcPortName]; ok {
mcsLocalService.GroupID = p.groupCounter.AllocateIfNotExist(exportedSvcPortName, false)
return mcsLocalService
}
}
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks a little hacky, making AntreaProxy too specific to Multicluster. It would be hard to maintain in long term if we add such logic to it.
And I think the way doesn't always work as even it could allocate an ID for this service, it may not exist in OVS and installing flows with this group will fail.
I wonder if we could have more generic code to handle cluster IP as endpoint IP's case, not specific to Multicluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If flow installation will fail when the group does not exist, then we need to wait for another 30s for retry?

No good idea about decoupling AntreaProxy and Multicluster.

Copy link
Member

@tnqn tnqn Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For decoupling AntreaProxy and Multicluster, I discussed this with @luolanzone offline. The proposal is:

  1. Add a attribute IsNested to ServiceInfo to indicate whether the service may be DNATed more than once.
// ServiceInfo is the internal struct for caching service information.
type ServiceInfo struct {
	*k8sproxy.BaseServiceInfo
	// cache for performance
	OFProtocol openflow.Protocol
 +      // IsNested represents the Service's Endpoints could be another Service's ClusterIP.
 +      IsNested bool
}

IsNested = service.Annotations[MulticlusterSpecificAnnotation]
  1. For a nested Service, its Service flow will mark one reg bit of the packet indicate "nested":
InstallServiceFlows(groupID binding.GroupIDType, 
        svcIP net.IP, 
        svcPort uint16, 
        protocol binding.Protocol, 
        affinityTimeout uint16, 
        nodeLocalExternal bool, 
+       nested bool,
        svcType v1.ServiceType) 
  1. For a non nested Service, it will install one extra flow in EndpointDNAT table to resubmit packets with "nested" bit to its own group with higher priority.

The order of installing multicluster-service and normal service doesn't matter as their flows are individual. When multicluster-service's local service is not installed, the packet will just be DNATed to local service IP as normal packet
When multicluster-service's local service is installed, the packet will be resubmitted to the local service's group to select the eventual endpoint.

The flows would look like the below, assuming 10.96.0.1 as multicluster service IP and 10.96.0.2 as normal service IP:

table=ServiceLB, priority=200,tcp,nw_dst=10.96.0.1,tp_dst=80 actions=load:0x1->NESTED-BIT,group:1 
table=ServiceLB, priority=200,tcp,nw_dst=10.96.0.2,tp_dst=80 actions=group:2
table=EndpointDNAT, priority=210,tcp,NESTED-BIT=0x1,reg3=0xa600002,reg4=0x20050/0x7ffff actions=group:2
table=EndpointDNAT, priority=200,tcp,reg3=0xa600002,reg4=0x20050/0x7ffff actions=ct(commit,table=AntreaPolicyEgressRule,zone=65520,nat(dst=10.96.0.2:80)

Whether to do all of these could be managed by a flag in AntreaProxy like supportedNestedService and it will be assigned the value of enableMulticlusterGateway, but the approach could potentially supports in-cluster nested Service in the future. What do you think?

@luolanzone luolanzone deleted the mc-local-ep branch March 9, 2023 00:53
luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 9, 2023
luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 9, 2023
When an Endpoint of a Multi-cluster Service is a local Service ClusterIP,
add a new flow for the Service the EndpointDNAT with group action to let it
go to the corresponding exported Service's group to select the final Endpoint.
This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and
handled by kube-proxy when it is running) and comes back again.

The proposal details can be found in the comment:
antrea-io#4508 (comment)

Signed-off-by: Lan Luo <luola@vmware.com>
luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 9, 2023
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with
group action. When an Endpoint of a Multi-cluster Service is a local Service
ClusterIP and being selected, it will go to the corresponding exported Service's
group to select the final Endpoint. This can avoid that the traffic goes out of the
OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and
comes back again.

The proposal details can be found in the comment:
antrea-io#4508 (comment)

Signed-off-by: Lan Luo <luola@vmware.com>
luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 9, 2023
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with
group action. When an Endpoint of a Multi-cluster Service is a local Service
ClusterIP and being selected, it will go to the corresponding exported Service's
group to select the final Endpoint. This can avoid that the traffic goes out of the
OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and
comes back again.

The proposal details can be found in the comment:
antrea-io#4508 (comment)

Signed-off-by: Lan Luo <luola@vmware.com>
luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 9, 2023
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with
group action. When an Endpoint of a Multi-cluster Service is a local Service
ClusterIP and being selected, it will go to the corresponding exported Service's
group to select the final Endpoint. This can avoid that the traffic goes out of the
OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and
comes back again.

The proposal details can be found in the comment:
antrea-io#4508 (comment)

Signed-off-by: Lan Luo <luola@vmware.com>
luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 9, 2023
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with
group action. When an Endpoint of a Multi-cluster Service is a local Service
ClusterIP and being selected, it will go to the corresponding exported Service's
group to select the final Endpoint. This can avoid that the traffic goes out of the
OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and
comes back again.

The proposal details can be found in the comment:
antrea-io#4508 (comment)

Signed-off-by: Lan Luo <luola@vmware.com>
luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 9, 2023
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with
group action. When an Endpoint of a Multi-cluster Service is a local Service
ClusterIP and being selected, it will go to the corresponding exported Service's
group to select the final Endpoint. This can avoid that the traffic goes out of the
OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and
comes back again.

The proposal details can be found in the comment:
antrea-io#4508 (comment)

Signed-off-by: Lan Luo <luola@vmware.com>
luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 9, 2023
…o#4508)"

This reverts commit 6cdbca3.

Signed-off-by: Lan Luo <luola@vmware.com>
luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 9, 2023
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with
group action. When an Endpoint of a Multi-cluster Service is a local Service
ClusterIP and being selected, it will go to the corresponding exported Service's
group to select the final Endpoint. This can avoid that the traffic goes out of the
OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and
comes back again.

The proposal details can be found in the comment:
antrea-io#4508 (comment)

Signed-off-by: Lan Luo <luola@vmware.com>
luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 9, 2023
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with
group action. When an Endpoint of a Multi-cluster Service is a local Service
ClusterIP and being selected, it will go to the corresponding exported Service's
group to select the final Endpoint. This can avoid that the traffic goes out of the
OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and
comes back again.

The proposal details can be found in the comment:
antrea-io#4508 (comment)

Signed-off-by: Lan Luo <luola@vmware.com>
luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 9, 2023
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with
group action. When an Endpoint of a Multi-cluster Service is a local Service
ClusterIP and being selected, it will go to the corresponding exported Service's
group to select the final Endpoint. This can avoid that the traffic goes out of the
OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and
comes back again.

The proposal details can be found in the comment:
antrea-io#4508 (comment)

Signed-off-by: Lan Luo <luola@vmware.com>
luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 9, 2023
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with
group action. When an Endpoint of a Multi-cluster Service is a local Service
ClusterIP and being selected, it will go to the corresponding exported Service's
group to select the final Endpoint. This can avoid that the traffic goes out of the
OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and
comes back again.

The proposal details can be found in the comment:
antrea-io#4508 (comment)

Signed-off-by: Lan Luo <luola@vmware.com>
luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 10, 2023
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with
group action. When an Endpoint of a Multi-cluster Service is a local Service
ClusterIP and being selected, it will go to the corresponding exported Service's
group to select the final Endpoint. This can avoid that the traffic goes out of the
OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and
comes back again.

The proposal details can be found in the comment:
antrea-io#4508 (comment)

Signed-off-by: Lan Luo <luola@vmware.com>
luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 13, 2023
…o#4508)"

This reverts commit 6cdbca3.

Signed-off-by: Lan Luo <luola@vmware.com>
luolanzone added a commit to luolanzone/antrea that referenced this pull request Mar 13, 2023
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with
group action. When an Endpoint of a Multi-cluster Service is a local Service
ClusterIP and being selected, it will go to the corresponding exported Service's
group to select the final Endpoint. This can avoid that the traffic goes out of the
OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and
comes back again.

The proposal details can be found in the comment:
antrea-io#4508 (comment)

Signed-off-by: Lan Luo <luola@vmware.com>
tnqn pushed a commit that referenced this pull request Mar 14, 2023
* Revert "Refine Endpoint selection for multi-cluster Service (#4508)"

This reverts commit 6cdbca3.

Signed-off-by: Lan Luo <luola@vmware.com>

* Refine Endpoint selection for MC Service

Add a new flow for the Service's ClusterIP in the EndpointDNAT table with
group action. When an Endpoint of a Multi-cluster Service is a local Service
ClusterIP and being selected, it will go to the corresponding exported Service's
group to select the final Endpoint. This can avoid that the traffic goes out of the
OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and
comes back again.

The proposal details can be found in the comment:
#4508 (comment)

Signed-off-by: Lan Luo <luola@vmware.com>

---------

Signed-off-by: Lan Luo <luola@vmware.com>
jainpulkit22 pushed a commit to urharshitha/antrea that referenced this pull request Apr 28, 2023
When an Endpoint of a Multi-cluster Service is a local Service ClusterIP,
change the flow action to let it go to the corresponding exported Service's
group to select the endpoint. This can avoid that the traffic goes out of
the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is
running) and comes back again.

Signed-off-by: Lan Luo <luola@vmware.com>
jainpulkit22 pushed a commit to urharshitha/antrea that referenced this pull request Apr 28, 2023
* Revert "Refine Endpoint selection for multi-cluster Service (antrea-io#4508)"

This reverts commit 6cdbca3.

Signed-off-by: Lan Luo <luola@vmware.com>

* Refine Endpoint selection for MC Service

Add a new flow for the Service's ClusterIP in the EndpointDNAT table with
group action. When an Endpoint of a Multi-cluster Service is a local Service
ClusterIP and being selected, it will go to the corresponding exported Service's
group to select the final Endpoint. This can avoid that the traffic goes out of the
OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and
comes back again.

The proposal details can be found in the comment:
antrea-io#4508 (comment)

Signed-off-by: Lan Luo <luola@vmware.com>

---------

Signed-off-by: Lan Luo <luola@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/multi-cluster Issues or PRs related to multi cluster.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Local traffic for multi-cluster service should be handled within OVS
5 participants