-
Notifications
You must be signed in to change notification settings - Fork 276
fix(cds): return local cluster endpoints per port #2478
Conversation
* convert test from using ginkgo to using built in go testing framework Signed-off-by: Michelle Noorali <minooral@microsoft.com>
* This test shows that etLocalServiceCluster returns unnecessary duplicate endpoints If there are multiple replicas of a Pod sitting behind the same service, there should only be one xds LbEndpoint programmed for the local cluster per port specified in the Kubernetes Service spec and it should look like the following: 0.0.0.0:<port>. * If you increase the replica count to 2 for bookbuyer in the demo, you'll see an additional 18 entries in the local clusters list Signed-off-by: Michelle Noorali <minooral@microsoft.com>
08a1715
to
e8bfa00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change.
While the existing usage of GetEndpointsForService
is not correct, using port
instead of targetPort
to program local clusters introduces new issues.
pkg/envoy/cds/cluster.go
Outdated
@@ -97,21 +96,21 @@ func getLocalServiceCluster(catalog catalog.MeshCataloger, proxyServiceName serv | |||
Http2ProtocolOptions: &xds_core.Http2ProtocolOptions{}, | |||
} | |||
|
|||
endpoints, err := catalog.ListEndpointsForService(proxyServiceName) | |||
ports, err := catalog.GetPortToProtocolMappingForService(proxyServiceName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we need to be using service.spec.ports[].targetPort
here. We can only use service.spec.ports[].port
if it is the same as the targetPort
. Since we are attempting to create local clusters for the root service in a traffic split.
I would change this to the following:
if cluster == root service in split:
ports = GetPortToProtocolMappingForService
else:
ports = GetTargetPortToProtocolMappingForService
This will ensure that we aren't setting up the local clusters with incorrect port numbers for cases where the targetPort is different from the port for a service.
e8bfa00
to
3150eff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I am guessing this will need to be changed again when there is a local cluster being created for the root service referenced in an SMI traffic split policy, which does not have endpoints (ex. bookstore in our demo).
db3840d
to
50ddb6e
Compare
Codecov Report
@@ Coverage Diff @@
## main #2478 +/- ##
==========================================
+ Coverage 58.55% 58.65% +0.10%
==========================================
Files 153 154 +1
Lines 6906 7005 +99
==========================================
+ Hits 4044 4109 +65
- Misses 2847 2880 +33
- Partials 15 16 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Needs to be reviewed again with latest changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be creating local clusters for synthetic services. Considering this is just an emulated service in the controller, we should be able to simply skip creating local clusters if the service name matches the naming syntax for the synthetic service.
pkg/catalog/xds_certificates.go
Outdated
for _, con := range pod.Spec.Containers { | ||
if con.Name != constants.EnvoyContainerName { | ||
if len(con.Ports) > 0 { | ||
syntheticService.SyntheticTargetPort = uint32(con.Ports[0].ContainerPort) | ||
break | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like to remove the need for synthetic services via #2064. Handling synthetic service in different parts of the code like this looks pretty hacky :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. I didn't want to go this route but I did want to see what could potentially work.
pkg/envoy/cds/cluster.go
Outdated
if proxyServiceName.SyntheticTargetPort != 0 { | ||
ports = map[uint32]string{proxyServiceName.SyntheticTargetPort: ""} | ||
} else { | ||
log.Error().Err(err).Msgf("Failed to get ports for service %s", proxyServiceName) | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of handling synthetic services, I don't see a reason why we need to create local clusters for services that do not really exist.
Can we just check if the service name matches the syntax of a synthetic service and skip creating local clusters instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we skip creating a local cluster, the tcp e2e fails where there is no service for the client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the classic RDS issue which #2064 is blocked on.
Discussed a simpler alternative offline. Looking forward to the changes!
50ddb6e
to
9bb7209
Compare
* getLocalServiceCluster was returning an xds cluster with endpoints based on Kubernetes service endpoints instead of building local cluster endpoints based on the target port specified in the service spec. i.e. 0.0.0.0:<port> As a result, there were 18 extra, duplicate entries in the envoy local clusters that were being programmed per Kubernetes service endpoints (or Pod replicas) * This fixes the getLocalServiceCluster to return a local cluster with endpoints based on the Kubernetes target port specified on the service spec rather than the Kubernetes Service endpoints. * It also handles cases of synthetic services. If a service is synthetic, a static cluster with no endpoints will be programmed as the local cluster Signed-off-by: Michelle Noorali <minooral@microsoft.com>
9bb7209
to
418b47f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Description:
The
getLocalServiceCluster
function was returning an xds cluster with endpoints based on Kubernetes service endpoints when it should have been building local cluster endpoints based on the port specified in the service spec. i.e. 0.0.0.0:As a result, there were 18 extra, duplicate entries in the envoy local clusters that were being programmed per Kubernetes service endpoints (or Pod replicas)
This PR fixes the getLocalServiceCluster to return a local cluster with endpoints based on the Kubernetes ports rather than the Kubernetes Service endpoints. The tests in the clusters_test.go have been converted from gingko to use the go testing framework and tests for getLocalServiceCluster has been added.
To see the issue exposed in a test, please pull down this branch and checkout the commit sha 9f95206 and run the tests. You can also check out that commit, run the demo, play with the replica count on bookbuyer and inspect the clusters in the envoy admin UI to see the duplicate entries.
Affected area:
Please answer the following questions with yes/no.