-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine Endpoint selection for multi-cluster Service #4508
Conversation
cb602bc
to
34aa403
Compare
Codecov Report
@@ Coverage Diff @@
## main #4508 +/- ##
==========================================
- Coverage 69.83% 69.76% -0.08%
==========================================
Files 401 415 +14
Lines 59529 58647 -882
==========================================
- Hits 41575 40917 -658
+ Misses 15142 14942 -200
+ Partials 2812 2788 -24
*This pull request uses carry forward flags. Click here to find out more.
|
pkg/agent/openflow/pipeline.go
Outdated
LoadToRegField(EndpointPortField, uint32(portVal)). | ||
ResubmitToTable(resubmitTableID). | ||
Done() | ||
if exportedSvcGroupID != 0 && f.isLocalServiceClusterIP(endpointIP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to check if the endpointIP is local cluster Service IP in openflow layer or not, we can check it proxier, and in openflow layer just leverage the expected groupID in the buckets as long as it is valid? proxier (client) should removce the endpoint from parameter []proxy.Endpoint if it is local Service IP.
16f33f8
to
1f56878
Compare
1f56878
to
bc54f09
Compare
/test-multicluster-e2e |
Hi @jianjuns @wenyingd @hongliangl I added a new config field 'serviceCIDR' under multicluster config section in this PR so that antrea-agent can check if Endpoint is local Service's ClusterIP or not. Another way to figure out the |
bc54f09
to
cdaf5b3
Compare
Do we have a better way to handle this rather than checking against |
Yes, Antrea Proxy already knows a CIDR that can contain all existing ClusterIPs, but it is located in pkg/agent/route, like the IPv4 Service CIDR antrea/pkg/agent/route/route_linux.go Line 122 in 114e14f
I think the Service CIDR can be obtained by adding a new public method to this interface antrea/pkg/agent/route/interfaces.go Line 25 in 114e14f
GetServiceCIDR(isIPv6 bool) *net.IPNet .
|
@hongliangl I checked the codes, it seems Antrea proxy will only calculate the ServiceCIDR when proxyAll is enabled. Do you know how does AntreaProxy know what's the ServiceCIDR when proxyAll is disabled? |
@jianjuns I double checked with hongliang about the current ServiceCIDR implementation on proxy. Antrea proxy will calculate ServiceCIDR only when proxyAll is enabled. So Hongliang's suggestion to expose ServiceCIDR from route pkg can work only when we enable proxyAll with multicluster. I feel this is not applicable for multi-cluster to get ServiceCIDR considering the proxyAll is disabled by default in most of cases. @tnqn @wenyingd @hongliangl let us know if you know other alternative ways, thanks a lot! |
Or we can calculate Service CIDR only in AntreaProxy, when mc is enable and when proxyAll is enabled, we calculate Service CIDR and install the route for the CIDR. |
Yeah, I think it is good to keep a single way to discover Service CIDR. I am not sure what is the best design choice. Probably you guys can figure out. |
cdaf5b3
to
17ff8fb
Compare
After sync with @tnqn and @hongliangl we plan to extract the logic of serviceCIDR calculation out of AntreaProxy, and make it as standalone module, eg: servicecidr_discover.go. Both AntreaProxy and Multi-cluster can be a caller of new module. Hongliang will help to estimate the effort and submit a standalone PR. thanks. |
pkg/agent/openflow/client.go
Outdated
@@ -618,11 +618,11 @@ func (c *client) GetPodFlowKeys(interfaceName string) []string { | |||
return c.getFlowKeysFromCache(c.featurePodConnectivity.podCachedFlows, interfaceName) | |||
} | |||
|
|||
func (c *client) InstallServiceGroup(groupID binding.GroupIDType, withSessionAffinity bool, endpoints []proxy.Endpoint) error { | |||
func (c *client) InstallServiceGroup(groupID, exportedSvcGroupID binding.GroupIDType, withSessionAffinity bool, endpoints []proxy.Endpoint) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expected behavior for a local Service deletion which is used as an Endpoint of the global Service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local Service will be deleted, then corresponding Endpoint in the global Service will also be deleted as well. When the Endpoints are changed, InstallServiceGroup should be called again without exportedSvcGroupID
.
17ff8fb
to
935f4f9
Compare
/test-multicluster-e2e |
Seems the multicluster e2e take longer time then before and a few MCNP cases are failed randomly. I tried a failed case locally, it works fine. syncing with @hjiajing and @GraysonWu regarding the failed issue. |
When the Endpoint of Multi-cluster Service is a local Service ClusterIP, refine the action to let it go to the corresponding exported Service's group to do final Endpoint selection. This can avoid the case that the traffic goes out of antrea-gw0 and goes back to OVS again when a local Pod is trying to access a MC Service but a local Service's Endpoint is selected. Signed-off-by: Lan Luo <luola@vmware.com>
/test-multicluster-e2e |
The MC testbed is recovered and e2e test is passed. |
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. I found conflict with my another PR and took a quick look, have some questions below.
if mcsLocalService != nil { | ||
needUpdateEndpoints = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means the whole endpoints of this service will be updated as long as there is any service/endpoints change in the whole cluster and when the peridiocal sync happens (30s).
Why it must update endpoints when there is a local service endpoint?
// For any Multi-cluster Service, its name will be a combination with prefix `antrea-mc-` and | ||
// exported Service's name. So we need to remove the prefix to look up the exported Service. | ||
if mcsLocalService != nil && strings.HasPrefix(svcPortName.Name, mcServiceNamePrefix) { | ||
exportedSvcPortName := svcPortName | ||
exportedSvcPortName.Name = strings.TrimPrefix(svcPortName.Name, mcServiceNamePrefix) | ||
if _, ok := p.serviceMap[exportedSvcPortName]; ok { | ||
mcsLocalService.GroupID = p.groupCounter.AllocateIfNotExist(exportedSvcPortName, false) | ||
return mcsLocalService | ||
} | ||
} | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks a little hacky, making AntreaProxy too specific to Multicluster. It would be hard to maintain in long term if we add such logic to it.
And I think the way doesn't always work as even it could allocate an ID for this service, it may not exist in OVS and installing flows with this group will fail.
I wonder if we could have more generic code to handle cluster IP as endpoint IP's case, not specific to Multicluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If flow installation will fail when the group does not exist, then we need to wait for another 30s for retry?
No good idea about decoupling AntreaProxy and Multicluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For decoupling AntreaProxy and Multicluster, I discussed this with @luolanzone offline. The proposal is:
- Add a attribute
IsNested
toServiceInfo
to indicate whether the service may be DNATed more than once.
// ServiceInfo is the internal struct for caching service information.
type ServiceInfo struct {
*k8sproxy.BaseServiceInfo
// cache for performance
OFProtocol openflow.Protocol
+ // IsNested represents the Service's Endpoints could be another Service's ClusterIP.
+ IsNested bool
}
IsNested = service.Annotations[MulticlusterSpecificAnnotation]
- For a nested Service, its Service flow will mark one reg bit of the packet indicate "nested":
InstallServiceFlows(groupID binding.GroupIDType,
svcIP net.IP,
svcPort uint16,
protocol binding.Protocol,
affinityTimeout uint16,
nodeLocalExternal bool,
+ nested bool,
svcType v1.ServiceType)
- For a non nested Service, it will install one extra flow in EndpointDNAT table to resubmit packets with "nested" bit to its own group with higher priority.
The order of installing multicluster-service and normal service doesn't matter as their flows are individual. When multicluster-service's local service is not installed, the packet will just be DNATed to local service IP as normal packet
When multicluster-service's local service is installed, the packet will be resubmitted to the local service's group to select the eventual endpoint.
The flows would look like the below, assuming 10.96.0.1 as multicluster service IP and 10.96.0.2 as normal service IP:
table=ServiceLB, priority=200,tcp,nw_dst=10.96.0.1,tp_dst=80 actions=load:0x1->NESTED-BIT,group:1
table=ServiceLB, priority=200,tcp,nw_dst=10.96.0.2,tp_dst=80 actions=group:2
table=EndpointDNAT, priority=210,tcp,NESTED-BIT=0x1,reg3=0xa600002,reg4=0x20050/0x7ffff actions=group:2
table=EndpointDNAT, priority=200,tcp,reg3=0xa600002,reg4=0x20050/0x7ffff actions=ct(commit,table=AntreaPolicyEgressRule,zone=65520,nat(dst=10.96.0.2:80)
Whether to do all of these could be managed by a flag in AntreaProxy like supportedNestedService and it will be assigned the value of enableMulticlusterGateway, but the approach could potentially supports in-cluster nested Service in the future. What do you think?
When an Endpoint of a Multi-cluster Service is a local Service ClusterIP, add a new flow for the Service the EndpointDNAT with group action to let it go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>
Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com>
* Revert "Refine Endpoint selection for multi-cluster Service (#4508)" This reverts commit 6cdbca3. Signed-off-by: Lan Luo <luola@vmware.com> * Refine Endpoint selection for MC Service Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: #4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com> --------- Signed-off-by: Lan Luo <luola@vmware.com>
When an Endpoint of a Multi-cluster Service is a local Service ClusterIP, change the flow action to let it go to the corresponding exported Service's group to select the endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. Signed-off-by: Lan Luo <luola@vmware.com>
* Revert "Refine Endpoint selection for multi-cluster Service (antrea-io#4508)" This reverts commit 6cdbca3. Signed-off-by: Lan Luo <luola@vmware.com> * Refine Endpoint selection for MC Service Add a new flow for the Service's ClusterIP in the EndpointDNAT table with group action. When an Endpoint of a Multi-cluster Service is a local Service ClusterIP and being selected, it will go to the corresponding exported Service's group to select the final Endpoint. This can avoid that the traffic goes out of the OVS bridge from antrea-gw0 (and handled by kube-proxy when it is running) and comes back again. The proposal details can be found in the comment: antrea-io#4508 (comment) Signed-off-by: Lan Luo <luola@vmware.com> --------- Signed-off-by: Lan Luo <luola@vmware.com>
When the Endpoint of Multi-cluster Service is a local Service ClusterIP, refine the action to let it go to the corresponding exported Service's group to do final Endpoint selection. This can avoid the case that the traffic goes out of antrea-gw0 and goes back to OVS again when a local Pod is trying to access a MC Service but a local Service's Endpoint is selected.
Resolve #4431
Signed-off-by: Lan Luo luola@vmware.com