Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NET-6581] perf: Fetch services once rather than per-node on deregister #3322

Merged

Conversation

zalimeni
Copy link
Member

@zalimeni zalimeni commented Dec 7, 2023

Rather than fetching all nodes in a cluster then listing services per-node, fetch all service instances directly by name.

Diff best viewed with ?w=1

Rationale

This change is motivated by customers with larger K8s clusters (node counts) seeing performance issues (slow to execute, high CPU util on Consul server) with reconciles.

This should generally reduce the cost of deregistration in endpoints controller reconciles (in terms of network calls) from N calls (N=node count) to 3-6 calls regardless of K8s cluster size:

  • 1-2 calls to fetch service names matching meta (default NS + non-default NS if applicable)
  • For the service and its proxy (2 catalog entries), 1-2 calls to fetch service instances (default NS + non-default NS if applicable)

This does trade the cost of node list and per-node instance fetching for a bulk fetch of service instances. However, a bulk fetch was the previous behavior prior to the introduction of Consul node mirroring in consul-k8s, and in the majority of real-world use cases, should be cheaper than listing all nodes and fetching services individually per node.

This also trades the use of the wildcard NS (which was giving us a more-global "GC" of service instances across namespaces for free, and not necessarily by intent) for targeted NS lookups; this is due to constraints of the endpoints we now want to target in the Catalog API. After discussion and considering likely use cases, we've elected to make this change and document that switching from one Consul NS or mirror prefix to another when already using non-default namespaces will require manual cleanup in the old Consul NS.

If needed in the future, we can further reduce network calls, improve Consul performance, and reintroduce wildcard NS search by implementing an internal or modified Catalog API endpoint fit to purpose for searching service instances by meta (rather than name).

Examples

Nodes: 400, Instances per node: 1, Occupied nodes per service: 100
  2x services fetch + 4x instances fetch                   = 6 network requests
  Max results per request: service instances fetch         = 100 results
    vs.
  1x nodes fetch + (nodes)x instances fetch                = 401 network requests
  Max results per request: nodes fetch                     = 400 results

Nodes: 400, Instances per node: 5, Occupied nodes per service: 20
  2x services fetch + 4x instances fetch                   = 6 network requests
  Max results per request: service instances fetch         = 100 results
    vs.
  1x nodes fetch + (nodes)x instances fetch                = 401 network requests
  Max results per request: nodes fetch                     = 400 results

Nodes: 10, Instances per node: 10, Occupied nodes per service: 10
  2x services fetch + 4x instances fetch                   = 6 network requests
  Max results per request: service instances fetch         = 100 results
    vs.
  1x nodes fetch + (nodes)x instances fetch                = 11 network requests
  Max results per request: nodes fetch                     = 10 results

Changes proposed in this PR

  • Fetch service instances for deregister once per service name + NS
  • Update tests to reflect alternate fetch path

How I've tested this PR

  • Unit tests pass
  • Manual local test on multi-node cluster to verify no regression (instances and ACL tokens removed on pod kill, service deletion)

How I expect reviewers to test this PR

👀 + 🧠

Checklist

@zalimeni zalimeni force-pushed the zalimeni/net-6581-sg-endpoints-perf-improvement-main branch from b1ced66 to c6c501d Compare December 7, 2023 18:12
@komapa
Copy link

komapa commented Dec 8, 2023

What are the chances of this change making it in the next 1.3.x release in a week or so?

Also, THANK YOU!

@komapa
Copy link

komapa commented Dec 8, 2023

I tried to run this in our (testing) cluster and got this error:

{"level":"error","ts":1702069866.9229004,"logger":"controller.endpoints","msg":"failed to deregister endpoints","name":"
hackathon-32-photonic-phone","ns":"hackathon-32","error":"Unexpected response code: 400 (Cannot use a wildcard namespace
 with this endpoint)","stacktrace":"github.com/hashicorp/consul-k8s/control-plane/connect-inject/controllers/endpoints.(
*Controller).Reconcile\n\t/Users/kirilangov/Workspace/github/consul-k8s/control-plane/connect-inject/controllers/endpoin
ts/endpoints_controller.go:264\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/Users
/kirilangov/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:122\nsigs.k8s.io/con
troller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/Users/kirilangov/go/pkg/mod/sigs.k8s.io/contr
oller-runtime@v0.14.6/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.
(*Controller).processNextWorkItem\n\t/Users/kirilangov/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/co
ntroller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/Users
/kirilangov/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235"}

Consul 1.15.7

@zalimeni
Copy link
Member Author

zalimeni commented Dec 8, 2023

@komapa thanks for the early look at this - I'd not expected it to be noticed yet while it was still in draft!

I'm still working through acceptance tests and I think you've run into an issue on Enterprise that I'll need to address before this change is ready. Once we've got something that can be tried again in your test cluster, I'd be happy to work with you to get it vetted.

While I'd love to get this in for the next patch release, I'm not confident it will be done in time - but we will keep you in the loop if that changes. cc @david-yu

@zalimeni zalimeni force-pushed the zalimeni/net-6581-sg-endpoints-perf-improvement-main branch 6 times, most recently from 61aeb0b to ef37650 Compare December 14, 2023 20:30
@zalimeni zalimeni added backport/1.1.x Backport to release/1.1.x branch backport/1.2.x This release branch is no longer active. backport/1.3.x labels Dec 14, 2023
@zalimeni zalimeni force-pushed the zalimeni/net-6581-sg-endpoints-perf-improvement-main branch 3 times, most recently from dd2b8c9 to b16de89 Compare December 15, 2023 14:39
err = r.deregisterNode(apiClient, svc.Node)
if err != nil {
r.Log.Error(err, "failed to deregister node", "svc", svc.ServiceName)
errs = multierror.Append(errs, err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the above code is unchanged. The GH diff is easier to read with ?w=1.

The changes are:

  • Field swaps between the old and new API response objects (see conversion fn here, called in the old endpoint implementation here)
    • CatalogNodeServiceList.Node.Node -> ServiceNode.Node
    • NodeService.ID -> ServiceNode.ServiceID
    • NodeService.Service -> ServiceNode.ServiceName
    • NodeService.Meta -> ServiceNode.ServiceMeta
    • NodeService.Address -> ServiceNode.ServiceAddress
  • Raising the inner loop / removing the per-node outer loop

Here's a visual of the diff of the two models read locally from a multi-node kind cluster (the sidecar proxy is similar and carries identical NodeMeta and ServiceMeta, which is what we filter on):
image

ReleaseName: "consul",
ReleaseNamespace: "default",
EnableConsulNamespaces: true,
ConsulDestinationNamespace: ts.ConsulNS,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last line addition of ConsulDestinationNamespace was needed so that the replacement of the wildcard NS search w/ a specific NS would simulate deregistration for the non-default NS in use.

@@ -2216,6 +2217,7 @@ func TestReconcileDeleteGatewayWithNamespaces(t *testing.T) {
}

token, _, err = consulClient.ACL().TokenRead(token.AccessorID, queryOpts)
require.Error(t, err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panic fix when test fails on no error

@@ -2089,7 +2089,7 @@ func TestReconcileDeleteGatewayWithNamespaces(t *testing.T) {
{
ID: "ingress-gateway",
Kind: api.ServiceKindIngressGateway,
Service: "ingress-gateway",
Service: consulSvcName,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and above: this test was specifying the wrong service name vs. the one checked in assertions that instances are deregistered. This turns the false negatives into true negatives.

Comment on lines -4602 to +4616
[]*api.AgentService{
[]*api.CatalogService{
{
ID: "foo1",
Service: "foo",
Meta: map[string]string{"k8s-service-name": k8sSvc, "k8s-namespace": k8sNS},
ID: "foo1",
ServiceName: "foo",
ServiceMeta: map[string]string{"k8s-service-name": k8sSvc, "k8s-namespace": k8sNS},
},
{
Kind: api.ServiceKindConnectProxy,
ID: "foo1-proxy",
Service: "foo-sidecar-proxy",
Port: 20000,
Proxy: &api.AgentServiceConnectProxyConfig{
ID: "foo1-proxy",
ServiceName: "foo-sidecar-proxy",
ServicePort: 20000,
ServiceProxy: &api.AgentServiceConnectProxyConfig{
DestinationServiceName: "foo",
DestinationServiceID: "foo1",
},
Meta: map[string]string{"k8s-service-name": k8sSvc, "k8s-namespace": k8sNS},
ServiceMeta: map[string]string{"k8s-service-name": k8sSvc, "k8s-namespace": k8sNS},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff is a helpful visualization of what fields were swapped between the two API models.

Comment on lines -1200 to +1224
filter := fmt.Sprintf(`Meta[%q] == %q and Meta[%q] == %q and Meta[%q] == %q`,
filter := fmt.Sprintf(`ServiceMeta[%q] == %q and ServiceMeta[%q] == %q and ServiceMeta[%q] == %q`,
Copy link
Member Author

@zalimeni zalimeni Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same filter, different model field. I checked agent/consul/catalog_endpoint.go and Meta is not currently used in an index lookup in the previously used /nodes and /node-services/{node} Catalog API endpoints; both are a bexp post-filter. This means that we are not trading an indexed query for a non-indexed one when it comes to the synthetic node or service name filtering.

If we find the new queries are not performant enough, we have the option of adding an index for ServiceMeta or a new purpose-fit endpoint.

Comment on lines +4689 to +4692
require.NotNil(t, svcs[0], c.expected[0])
require.Equal(t, c.expected[0].ServiceName, svcs[0].ServiceName)
require.NotNil(t, svcs[1], c.expected[1])
require.Equal(t, c.expected[1].ServiceName, svcs[1].ServiceName)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swapped the nil checks here since in that call, the first argument is what should be verified, not the expected value (informational)

@zalimeni zalimeni marked this pull request as ready for review December 15, 2023 14:57
@zalimeni zalimeni marked this pull request as draft December 15, 2023 17:18
@zalimeni zalimeni marked this pull request as ready for review December 15, 2023 20:55
Copy link
Contributor

@thisisnotashwin thisisnotashwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chef-kiss

@zalimeni zalimeni force-pushed the zalimeni/net-6581-sg-endpoints-perf-improvement-main branch from b16de89 to 1171d16 Compare December 21, 2023 15:09
Rather than fetching all nodes in a cluster then listing services
per-node, fetch all service instances directly by name.

This should generally reduce the cost of deregistration in endpoints
controller reconciles (in terms of network calls) from N calls (N=node
count) to 3-6 calls regardless of K8s cluster size.

This does trade the cost of node list and per-node instance fetching for
a bulk fetch of service instances. However, a bulk fetch was the
previous behavior prior to the introduction of Consul node mirroring in
`consul-k8s`, and in the majority of real-world use cases, should be
cheaper than listing all nodes and fetching services individually per
node in the vast majority of cases.

This change is motivated by customers with larger K8s clusters (node
counts) seeing performance issues with reconciles.
@zalimeni zalimeni force-pushed the zalimeni/net-6581-sg-endpoints-perf-improvement-main branch from 1171d16 to ad067a8 Compare January 9, 2024 18:00
@zalimeni
Copy link
Member Author

zalimeni commented Jan 9, 2024

Rebased for merge

@zalimeni zalimeni merged commit 2fc22e3 into main Jan 9, 2024
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.1.x Backport to release/1.1.x branch backport/1.2.x This release branch is no longer active. backport/1.3.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants