Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

fix(cds): return local cluster endpoints per port #2478

Merged
merged 3 commits into from
Feb 10, 2021

Conversation

michelleN
Copy link
Contributor

@michelleN michelleN commented Feb 8, 2021

Description:
The getLocalServiceCluster function was returning an xds cluster with endpoints based on Kubernetes service endpoints when it should have been building local cluster endpoints based on the port specified in the service spec. i.e. 0.0.0.0:
As a result, there were 18 extra, duplicate entries in the envoy local clusters that were being programmed per Kubernetes service endpoints (or Pod replicas)

This PR fixes the getLocalServiceCluster to return a local cluster with endpoints based on the Kubernetes ports rather than the Kubernetes Service endpoints. The tests in the clusters_test.go have been converted from gingko to use the go testing framework and tests for getLocalServiceCluster has been added.

To see the issue exposed in a test, please pull down this branch and checkout the commit sha 9f95206 and run the tests. You can also check out that commit, run the demo, play with the replica count on bookbuyer and inspect the clusters in the envoy admin UI to see the duplicate entries.

Affected area:

  • New Functionality [ ]
  • Documentation [ ]
  • Install [ ]
  • Control Plane [X ]
  • CLI Tool [ ]
  • Certificate Management [ ]
  • Networking [ ]
  • Metrics [ ]
  • SMI Policy [ ]
  • Security [ ]
  • Tests [ ]
  • CI System [ ]
  • Performance [X ]
  • Other [ ]

Please answer the following questions with yes/no.

  • Does this change contain code from or inspired by another project? If so, did you notify the maintainers and provide attribution? No

@michelleN michelleN requested a review from a team as a code owner February 8, 2021 16:15
Michelle Noorali added 2 commits February 8, 2021 11:16
* convert test from using ginkgo to using built in go
testing framework

Signed-off-by: Michelle Noorali <minooral@microsoft.com>
* This test shows that etLocalServiceCluster returns unnecessary duplicate endpoints
If there are multiple replicas of a Pod sitting behind the same service,
there should only be one xds LbEndpoint programmed for the local cluster per port
specified in the Kubernetes Service spec and it should look like the following:
0.0.0.0:<port>.
* If you increase the replica count to 2 for bookbuyer in the demo, you'll
see an additional 18 entries in the local clusters list

Signed-off-by: Michelle Noorali <minooral@microsoft.com>
@michelleN michelleN force-pushed the cdsclusters branch 2 times, most recently from 08a1715 to e8bfa00 Compare February 8, 2021 16:34
Copy link
Member

@shashankram shashankram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change.

While the existing usage of GetEndpointsForService is not correct, using port instead of targetPort to program local clusters introduces new issues.

@@ -97,21 +96,21 @@ func getLocalServiceCluster(catalog catalog.MeshCataloger, proxyServiceName serv
Http2ProtocolOptions: &xds_core.Http2ProtocolOptions{},
}

endpoints, err := catalog.ListEndpointsForService(proxyServiceName)
ports, err := catalog.GetPortToProtocolMappingForService(proxyServiceName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we need to be using service.spec.ports[].targetPort here. We can only use service.spec.ports[].port if it is the same as the targetPort. Since we are attempting to create local clusters for the root service in a traffic split.

I would change this to the following:

if cluster == root service in split:
  ports = GetPortToProtocolMappingForService
else:
  ports = GetTargetPortToProtocolMappingForService

This will ensure that we aren't setting up the local clusters with incorrect port numbers for cases where the targetPort is different from the port for a service.

shashankram
shashankram previously approved these changes Feb 9, 2021
Copy link
Member

@shashankram shashankram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I am guessing this will need to be changed again when there is a local cluster being created for the root service referenced in an SMI traffic split policy, which does not have endpoints (ex. bookstore in our demo).

@michelleN michelleN force-pushed the cdsclusters branch 3 times, most recently from db3840d to 50ddb6e Compare February 10, 2021 16:43
@codecov-io
Copy link

codecov-io commented Feb 10, 2021

Codecov Report

Merging #2478 (418b47f) into main (c863662) will increase coverage by 0.10%.
The diff coverage is 38.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2478      +/-   ##
==========================================
+ Coverage   58.55%   58.65%   +0.10%     
==========================================
  Files         153      154       +1     
  Lines        6906     7005      +99     
==========================================
+ Hits         4044     4109      +65     
- Misses       2847     2880      +33     
- Partials       15       16       +1     
Flag Coverage Δ
unittests 58.65% <38.88%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/service/types.go 77.77% <0.00%> (-4.58%) ⬇️
pkg/envoy/cds/cluster.go 93.00% <33.33%> (-3.81%) ⬇️
pkg/envoy/cds/response.go 63.26% <50.00%> (-1.18%) ⬇️
pkg/catalog/fake.go 82.00% <0.00%> (-6.89%) ⬇️
pkg/injector/patch.go 72.09% <0.00%> (-1.25%) ⬇️
pkg/envoy/route/config.go 95.20% <0.00%> (-0.80%) ⬇️
pkg/catalog/mock_catalog.go 0.00% <0.00%> (ø)
pkg/kubernetes/mock_controller.go 0.00% <0.00%> (ø)
pkg/utils/serviceaccount.go 100.00% <0.00%> (ø)
pkg/debugger/proxy.go 5.71% <0.00%> (+0.15%) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c863662...418b47f. Read the comment docs.

@shashankram shashankram dismissed their stale review February 10, 2021 16:54

Needs to be reviewed again with latest changes.

Copy link
Member

@shashankram shashankram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be creating local clusters for synthetic services. Considering this is just an emulated service in the controller, we should be able to simply skip creating local clusters if the service name matches the naming syntax for the synthetic service.

Comment on lines 71 to 78
for _, con := range pod.Spec.Containers {
if con.Name != constants.EnvoyContainerName {
if len(con.Ports) > 0 {
syntheticService.SyntheticTargetPort = uint32(con.Ports[0].ContainerPort)
break
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to remove the need for synthetic services via #2064. Handling synthetic service in different parts of the code like this looks pretty hacky :-(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. I didn't want to go this route but I did want to see what could potentially work.

Comment on lines 101 to 106
if proxyServiceName.SyntheticTargetPort != 0 {
ports = map[uint32]string{proxyServiceName.SyntheticTargetPort: ""}
} else {
log.Error().Err(err).Msgf("Failed to get ports for service %s", proxyServiceName)
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of handling synthetic services, I don't see a reason why we need to create local clusters for services that do not really exist.

Can we just check if the service name matches the syntax of a synthetic service and skip creating local clusters instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we skip creating a local cluster, the tcp e2e fails where there is no service for the client

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the classic RDS issue which #2064 is blocked on.

@shashankram shashankram dismissed their stale review February 10, 2021 18:12

Discussed a simpler alternative offline. Looking forward to the changes!

* getLocalServiceCluster was returning an xds cluster with endpoints
based on Kubernetes service endpoints instead of building local cluster
endpoints based on the target port specified in the service spec.
i.e. 0.0.0.0:<port>
As a result, there were 18 extra, duplicate  entries in the envoy local
clusters that were being programmed per Kubernetes service endpoints
(or Pod replicas)
* This fixes the getLocalServiceCluster to return a local cluster with endpoints
based on the Kubernetes target port specified on the service spec rather than
the Kubernetes Service endpoints.
* It also handles cases of synthetic services. If a service is synthetic, a
static cluster with no endpoints will be programmed as the local cluster

Signed-off-by: Michelle Noorali <minooral@microsoft.com>
Copy link
Member

@shashankram shashankram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@michelleN michelleN merged commit 9e7b53f into openservicemesh:main Feb 10, 2021
@michelleN michelleN deleted the cdsclusters branch February 10, 2021 20:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants