Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoscaling Elasticsearch: Introduce a dedicated custom resource #5978

Merged
merged 36 commits into from
Sep 23, 2022

Conversation

barkbay
Copy link
Contributor

@barkbay barkbay commented Aug 29, 2022

This PR introduces a dedicated Kubernetes Resource Definition, and an associated controller, to configure Elasticsearch autoscaling with ECK.

Naming and group

This PR introduces a new resource named ElasticsearchAutoscaler. It lives in a group named autoscaling.k8s.elastic.co:

apiVersion: autoscaling.k8s.elastic.co/v1alpha1
kind: ElasticsearchAutoscaler

This is mostly to let the door open to any additional autoscaling resources that we may want to add in the future.

ElasticsearchAutoscaler specification

The new CRD is very similar to the actual autoscaling annotation, with the obvious difference that a elasticsearchRef must be provided by the user:

apiVersion: autoscaling.k8s.elastic.co/v1alpha1
kind: ElasticsearchAutoscaler
metadata:
  name: autoscaling-sample
spec:
  elasticsearchRef:
    name: elasticsearch-sample
  policies:
    - name: di
      roles: ["data", "ingest" , "transform"]
      ## Optional: section below can be used if fine-grain tuning of the Elasticsearch deciders is required.
      #deciders:
      #  proactive_storage:
      #    forecast_window: 5m
      resources:
        nodeCount:
          min: 3
          max: 8
        cpu:
          min: 2
          max: 8
        memory:
          min: 2Gi
          max: 16Gi
        storage:
          min: 64Gi
          max: 512Gi
    - name: ml
      roles:
        - ml
      resources:
        nodeCount:
          min: 1
          max: 9
        cpu:
          min: 1
          max: 4
        memory:
          min: 2Gi
          max: 8Gi
        storage:
          min: 1Gi
          max: 1Gi

Only one cluster can be managed by a given ElasticsearchAutoscaler. This is similar to the K8S HorizontalPodAutoscaler and VerticalPodAutoscaler. It also makes it easier to understand the autoscaler status and the relationship between the autoscaler and the Elasticsearch cluster.

Status

The status consists of 2 main elements:

  1. conditions provides an overall state of the reconciliation status.
  2. policies holds the calculated resources and any error/important messages.
status:
  conditions:
  - lastTransitionTime: "2022-08-29T11:11:43Z"
    status: "False"
    type: Limited
  - lastTransitionTime: "2022-08-27T17:02:32Z"
    status: "True"
    type: Healthy
  - lastTransitionTime: "2022-08-27T17:01:41Z"
    status: "True"
    type: Active
  - lastTransitionTime: "2022-08-27T17:02:32Z"
    message: Elasticsearch is available
    status: "True"
    type: Online
  observedGeneration: 3
  policies:
  - lastModificationTime: "2022-08-29T11:19:07Z"
    name: di
    nodeSets:
    - name: di
      nodeCount: 7
    resources:
      limits:
        cpu: "2"
        memory: 8Gi
      requests:
        cpu: "2"
        memory: 8Gi
        storage: 5Gi
  - lastModificationTime: "2022-08-29T11:19:07Z"
    name: ml
    nodeSets:
    - name: ml
      nodeCount: 0
    resources:
      limits:
        cpu: "1"
        memory: 2Gi
      requests:
        cpu: "1"
        memory: 2Gi
        storage: 1Gi

Printed columns

The conditions Limited, Active and Healthy are printed as part of the output of kubectl get elasticsearchautoscaler.autoscaling.k8s.elastic.co/<autoscaler_name>:

NAME                                                                    TARGET                 ACTIVE   HEALTHY   LIMITED
elasticsearchautoscaler.autoscaling.k8s.elastic.co/autoscaling-sample   elasticsearch-sample   True     True      True
  • TARGET: the name of Elasticsearch cluster which is autoscaled
  • ACTIVE: True when the ElasticsearchAutoscaler resource is managed by the operator, and the target Elasticsearch cluster does exist.
  • HEALTHY: True if resources have been calculated for all the autoscaling policies and no error has been encountered during the reconciliation process.
  • LIMITED: True when a resource limit is reached

For each printed condition there is an additional message in the related condition.

Noteworthy change

By default there is now a ratio of 1:1 between the CPU limit and the request. This is to comply with the current desired nodes API implementation.

Testing

  • Elasticsearch autoscaling is an Enterprise feature, both at the ECK and the Elasticsearch level, in dev mode you can start a trial.
  • Autoscaling events can be generated by using the fixed decider, this is what is done in the e2e test for an example:
    // Use the fixed decider to trigger a scale up of the data tier up to its max memory limit and 3 nodes.
    esaScaleUpStorageBuilder := autoscalingBuilder.DeepCopy().WithFixedDecider("data-ingest", map[string]string{"storage": "19gb", "nodes": "3"})

TODO:

@barkbay barkbay added >feature Adds or discusses adding a feature to the product autoscaling v2.5.0 labels Aug 29, 2022
@barkbay barkbay marked this pull request as draft August 29, 2022 15:12
@barkbay

This comment was marked as outdated.

@barkbay

This comment was marked as resolved.

@barkbay
Copy link
Contributor Author

barkbay commented Aug 31, 2022

Last commit enables advanced validation:

  • Either by using the admission controller:
for: "config/recipes/autoscaling/elasticsearch.yaml": admission webhook "elastic-esa-validation-v1alpha1.k8s.elastic.co" denied the request:
ElasticsearchAutoscaler.autoscaling.k8s.elastic.co "autoscaling-sample" is invalid:
spec.policies[0].resources.nodeCount.min: Invalid value: -1: min count must be equal or greater than 0
  • Or at the conditions level if the webhook is disabled:
status:
  conditions:
  - lastTransitionTime: "2022-08-31T09:58:03Z"
    message: Autoscaler is unhealthy
    status: "True"
    type: Active
  - lastTransitionTime: "2022-08-31T09:58:03Z"
    message: 'ElasticsearchAutoscaler.autoscaling.k8s.elastic.co "autoscaling-sample"
      is invalid: spec.policies[0].resources.nodeCount.min: Invalid value: -1: min
      count must be equal or greater than 0'
    status: "False"
    type: Healthy
  - lastTransitionTime: "2022-08-31T09:58:03Z"
    message: Autoscaler is unhealthy
    status: "False"
    type: Online
  - lastTransitionTime: "2022-08-30T11:53:45Z"
    status: "False"
    type: Limited
  observedGeneration: 2

Note that the custom resource is considered as "unhealthy" in that case:

NAME                                                                    TARGET                 ACTIVE   HEALTHY   LIMITED
elasticsearchautoscaler.autoscaling.k8s.elastic.co/autoscaling-sample   elasticsearch-sample   True     False     False

@barkbay barkbay marked this pull request as ready for review September 5, 2022 06:36
@barkbay
Copy link
Contributor Author

barkbay commented Sep 12, 2022

@barkbay the initial part that caught me when beginning to look at this, is that the cpu validation seems to differ from standard k8s resources.*.cpu validation.

  • : Invalid value: "": "spec.policies.resources.cpu.min" must validate at least one schema (anyOf)
  • spec.policies.resources.cpu.min: Invalid value: "number": spec.policies.resources.cpu.min in body must be of type integer: "number"

Is this intentional?

Unless I'm missing something there is nothing specific to the generation of the OpenAPIV3 schema of Quantity values in this PR.
Note that my IDE expects either an integer or a string (using the m suffix):

image

With the m suffix:

image

With quotes:

image

@barkbay
Copy link
Contributor Author

barkbay commented Sep 12, 2022

@naemono see also:

apiVersion: v1
kind: Pod
metadata:
  name: cpu-demo
  namespace: cpu-example
spec:
  containers:
  - name: cpu-demo-ctr
    image: vish/stress
    resources:
      limits:
        cpu: "1"
      requests:
        cpu: "0.5"
    args:
    - -cpus
    - "2"

I don't understand how the example you mentioned is not rejected by the API server tbh.

@naemono
Copy link
Contributor

naemono commented Sep 12, 2022

I don't understand how the example you mentioned is not rejected by the API server tbh.

I think I'm simply assuming that the Elasticsearch.spec.nodeSets[].podTemplate.spec.containers[].resources follows the same validation rules as Pod.spec.containers[].resources. Strange, when I apply the following, it works without failure, even without quotes.

apiVersion: v1
kind: Pod
metadata:
  name: cpu-demo
spec:
  containers:
  - name: cpu-demo-ctr
    image: vish/stress
    resources:
      limits:
        cpu: 1.0
      requests:
        cpu: 0.5
    args:
    - -cpus
    - "2"

It's probably not worth digging into to find why the validation is slightly different, but it certainly seems to be.

@pebrc pebrc self-assigned this Sep 19, 2022
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to look through the source code and I it looks really good, for me it is ready to merge. I only found a few nits here and there and one incorrect usage of the log API. I also ran some tests and found some issues. However I am not sure if they are related to this PR or whether we have the same already with annotation based autoscaling.

What I am seeing is that the Elasticsearch resource never fully reconciles (or at least only briefly). This is because the desired nodes API integration throws errors like the following:

2022-09-19T20:45:16.341+0200	ERROR	manager.eck-operator	Reconciler error	{"service.version": "2.5.0-SNAPSHOT+8831df33", "controller": "elasticsearch-controller", "object": {"name":"es","namespace":"default"}, "namespace": "default", "name": "es", "reconcileID": "896949ed-99d7-4f71-a02f-c839d853a9aa", "error": "elasticsearch client failed for https://es-es-internal-http.default.svc:9200/_internal/desired_nodes/651bb9ea-360b-462f-87a1-fe5415d78ac5/8?error_trace=true: 400 Bad Request: {Status:400 Error:{CausedBy:{Reason: Type:} Reason:Desired nodes with history [651bb9ea-360b-462f-87a1-fe5415d78ac5] and version [8] already exists with a different definition Type:

It seems that the same generation of the ES resource is reconciled multiple times (expected I would say) but that the desired nodes differ within one generation. I have not dug deeper yet why exactly this is It seems likely that this is related to changing resources for the autoscaled node sets (the part that confuses me here is that if the resources change I would also expect the generation to change)

<       "storage": "3221225472b",
---
>       "storage": "2147483648b",

I think this is because we take the desired storage form the volume claim which are being resized one by one while the Elasticsearch resource is not changing.

Because the nodes reconciliation has already happened at this point the autoscaling itself is not negatively affected and the cluster keeps scaling correctly. But the Elasticsearch status is stuck in applying changes and the log is full of the reconciler errors.

pkg/controller/autoscaling/elasticsearch/controller.go Outdated Show resolved Hide resolved
truncated := ""
count := 0
for _, char := range s {
truncated += string(char)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could you not just re-slice the underlying byte array once your have reached n instead of concatenating the runes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this is what you had in mind: fe77306

pkg/telemetry/telemetry.go Show resolved Hide resolved
pkg/apis/autoscaling/v1alpha1/elasticsearch_types.go Outdated Show resolved Hide resolved
pkg/controller/autoscaling/elasticsearch/controller.go Outdated Show resolved Hide resolved
pkg/controller/autoscaling/elasticsearch/controller.go Outdated Show resolved Hide resolved
},
},
checker: yesCheck,
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the wantValidationError here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually support the case where the user is not relying on the default volume claim. I added a comment in the unit test here.

pkg/controller/common/autoscaling/association.go Outdated Show resolved Hide resolved
@barkbay barkbay requested a review from naemono September 21, 2022 05:28
@barkbay
Copy link
Contributor Author

barkbay commented Sep 21, 2022

This is because the desired nodes API integration throws errors like the following:
[...]
But the Elasticsearch status is stuck in applying changes and the log is full of the reconciler errors.

I hit a similar issue in #5979 I think we need to revisit the way we use the desired nodes API for the next release 😕

Makefile Outdated Show resolved Hide resolved
pkg/apis/common/v1alpha1/resources.go Outdated Show resolved Hide resolved
Copy link
Contributor

@thbkrkr thbkrkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work!

Nit: almost all k8s resources displayed via kubectl have an "AGE" column, not this one. No big deal but my eyes aren't used to it.

I spotted a small limitation I think, we can't create an autoscaler for an ES that has 1 nodeSet without explicit `node.roles, which seems acceptable to me.

  nodeSets:
  - name: default
    count: 3
    config:
      node.store.allow_mmap: false

test/e2e/test/elasticsearch/autoscaling/builder.go Outdated Show resolved Hide resolved
pkg/controller/autoscaling/elasticsearch.go Outdated Show resolved Hide resolved
pkg/controller/autoscaling/elasticsearch/policy.go Outdated Show resolved Hide resolved
pkg/controller/autoscaling/elasticsearch/reconcile.go Outdated Show resolved Hide resolved
pkg/apis/elasticsearch/v1/status.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@barkbay
Copy link
Contributor Author

barkbay commented Sep 22, 2022

@naemono I think I addressed your comments, please let me know if I missed something 🙇

Copy link
Contributor

@naemono naemono left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Nice work @barkbay

pkg/apis/common/v1alpha1/autoscaling_status.go Outdated Show resolved Hide resolved
@barkbay barkbay merged commit e7bd34c into elastic:main Sep 23, 2022
barkbay added a commit that referenced this pull request Sep 23, 2022
Follow up of #5978 which has been merged into main with the wrong controller tools version.
fantapsody pushed a commit to fantapsody/cloud-on-k8s that referenced this pull request Feb 7, 2023
…stic#5978)

This (huge) commit introduces a dedicated Kubernetes Resource Definition, and an associated controller, to configure Elasticsearch autoscaling with ECK.
fantapsody pushed a commit to fantapsody/cloud-on-k8s that referenced this pull request Feb 7, 2023
Follow up of elastic#5978 which has been merged into main with the wrong controller tools version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autoscaling >feature Adds or discusses adding a feature to the product v2.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants