cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation prevents GKE nodepool to scale down #13984

paoloyx · 2023-05-17T11:12:03Z

Describe the bug
It's not a bug but a support request.

The GKE autoscaler tells me that cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation is preventing nodepool to be scaled-down. The annotation is found on these pods:

activator-xxxxx
autoscaler-xxxxx
domainmapping-webhook-xxxxx

Expected behavior
I'm running pods in multi-replica with a PDB configured with minAvailability - made possible by knative/operator#1125 - and I would expect node to be drained. For sure I'm doing it for the activator pod.

Is the cluster-autoscaler.kubernetes.io/safe-to-evict: "false" a mandatory one? Is there a way to configure it another way so the nodepool can properly scale down?

To Reproduce
N/A

Knative release version
Knative operator running in cluster at version 1.7.1. Also Knative-serving is at version 1.7.1

Additional context
GKE cluster is a standard one and running the 1.23.17-gke.1700 version of K8s

The text was updated successfully, but these errors were encountered:

dprotaso · 2023-05-17T15:10:30Z

@paoloyx I closed the operator issue as a dupe - and pull in the body. We can sort stuff out here and if this is really a conflicting config with the operator pdb then we can transfer the issue over to that repo

dprotaso · 2023-05-17T15:15:22Z

Is the cluster-autoscaler.kubernetes.io/safe-to-evict: "false" a mandatory one?

This was added long ago (#4329) - so this might be a case were they aren't necessary when there is a Pod Disruption Budget.

Can you elaborate on what you expect to see when a node is being drained?

paoloyx · 2023-05-18T08:14:00Z

@dprotaso thanks for having managed the duplicate thing, I was not sure were I was expected to open the issue.

For sure I'll try to elaborate but please bear in mind that I'm not so very expert about Knative jobs/services - that's to say what are the duties of activator or autoscaler or other services.

A premise about used configuration

We're using v1.7.1 version of operator to install Knative into a 1.23.17-gke.1700 GKE standard cluster. For example the serving part is installed with this CR

apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
spec:
  version: 1.7.1
  high-availability:
    replicas: 2
  config:
    features:
      kubernetes.podspec-fieldref: "enabled"
    gc:
      min-non-active-revisions: "0"
      max-non-active-revisions: "0"
      retain-since-create-time: "disabled"
      retain-since-last-active-time: "disabled"
  deployments:
  - name: autoscaler
    replicas: 1
  - name: autoscaler-hpa
    replicas: 1
  - name: controller
    replicas: 1
  - name: domain-mapping 
    replicas: 1
  - name: domainmapping-webhook
    replicas: 1
  - name: net-istio-controller
    replicas: 1
  - name: net-istio-webhook
    replicas: 1
  podDisruptionBudgets:
  - name: activator-pdb
    minAvailable: 50%
  - name: webhook-pdb
    minAvailable: 50%

It probably seems a cumbersome configuration.
The rationale is that before knative/operator#1125 the PDB configuration of some services did prevent nodes from being drained, as there were no disruptions available.
Thanks to knative/operator#1125 I was able to put affected services (i.e. activator and webhook) in replica two and a PDB definition that allows to have at least a disruption (i.e. minAvailable: 50% so one pod out of two can be evicted, when a nodepool is being drained one node at a time).

Scale down is still prevented

As of now I assume that PDB configuration is correct and allows me to drain nodes, so nodepool can be scaled down by GKE autoscaler.

The autoscaler noScaleDown events can be seen from Google console, and state that following pods prevent downscale with "no.scale.down.node.pod.not.safe.to.evict.annotation" because they are annotated with the cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation:

activator-xxxxx --> only this workload also has the PDB
autoscaler-xxxxx
domainmapping-webhook-xxxxx

Conclusion

I do not know if the cluster-autoscaler.kubernetes.io/safe-to-evict: "false" could be made optional, so to let cluster autoscalers (in my case a GKE one) properly scale down nodepools

Hope that I gave all the necessary details, let me know @dprotaso, thank you very much!

paoloyx · 2023-05-18T10:28:02Z

Just wanted to add some other details about another cluster where we're experiencing the same "issue"...there are other pods outside of the ones I reported before, namely:

webhook-xxxx
operator-webhook-xxxx

So, in general, it seems (or at least is my suspect) that all is linked to that cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation that is put into a variety of workloads, as you pointed out @dprotaso

Thanks, again, for looking at this one

paoloyx · 2023-05-23T07:36:42Z

@dprotaso Hi, just to know if the provided input is enough or if you still need info on my side. I'll be more than glad to give them if there is something more specific that you need, thank you so much.

dprotaso · 2023-05-23T20:34:21Z

The original intent is that the activator, autoscaler, webhook etc should always be running otherwise you'll have a disruption in traffic and creating knative service.

It seems like these annotations were added prior to PDB being stable in Kubernetes. The use of cluster-autoscaler.kubernetes.io/safe-to-evict: "false" confuses me since at some point we do need to drain nodes (ie. cluster updating nodes on GKE). It seems the workaround requires manual intervention to get the nodes to drain?

I do not know if the cluster-autoscaler.kubernetes.io/safe-to-evict: "false" could be made optional, so to let cluster autoscalers (in my case a GKE one) properly scale down nodepools

Based on what I've been reading [1,2] it seems like PDB is the better the way to control pod counts when there is a disruption (scaling down nodes) by safely reshuffling pods onto ready nodes. I think the Knative operator should be removing safe-to-evict when the PDB is being overriden for a component.

I'm inclined to remove safe-to-evict and add PDB's for all the control plane components - but I'd like to get feedback from other users running Knative in production.

[1] https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md
[2] https://kubernetes.io/docs/concepts/workloads/pods/disruptions/

dprotaso · 2023-05-23T20:42:20Z

Looks like Tekton removed the annotation as well - tektoncd/pipeline#4124 to unblock node scale down. But they removed their PodDisruptionBudgets - tektoncd/pipeline#3787 because it was preventing draining in non-HA configurations.

paoloyx · 2023-05-24T08:26:31Z

@dprotaso

The original intent is that the activator, autoscaler, webhook etc should always be running otherwise you'll have a disruption in traffic and creating knative service.

It seems like these annotations were added prior to PDB being stable in Kubernetes.
it looks that they then seemed like a reasonable choice to prevent disruptions

Yes, I can understand why those annotations were introduced then

The use of cluster-autoscaler.kubernetes.io/safe-to-evict: "false" confuses me since at some point we do need to drain nodes (ie. cluster updating nodes on GKE). It seems the workaround requires manual intervention to get the nodes to drain?

Exactly, the whole nodepool upgrade process eventually stops himself and there is the need to do a manual operation to rollout the stucked Knative component

I'm inclined to remove safe-to-evict and add PDB's for all the control plane components - but I'd like to get feedback from other users running Knative in production.
ok, thank you for you reasonings and I agree that PDBs could be the right way to handle disruptions, especially when running in HA.

Let's wait for some feedback, as you suggest. It's perfectly reasonable.
While waiting for an approved solution, we'll like to use something like explained in Override replicas, labels and annotations

So for example we could do

apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
spec:
  high-availability:
    replicas: 2
  deployments:
  - name: autoscaler
    annotations:
      cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

Could it work in your opinion with 1.7.1 version of both operator and serving components?

paoloyx · 2023-05-24T09:35:58Z

##EDIT##

The "solution" explained below actually does not prevent cluster to tear down knative-workload anymore, so it is not probably a good one as could cause service disruption. I think that we need a proper PDB support as outlined in previous posts

So, after a quick test about the usage of annotations in KnativeServing CR (see [1]) I can confirm that it seems to work as a temporary solution to override the cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotations that are blocking scaling down.

Spun up a local k8s cluster with k3d (1.25 but i think that any recent version is ok)
Installed the 1.7.1 operator from [2]
Created a CR like the following

apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
spec:
  version: 1.7.1
  high-availability:
    replicas: 2
  deployments:
  - name: webhook
    annotations:
      cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
  podDisruptionBudgets:
  - name: activator-pdb
    minAvailable: 50%
  - name: webhook-pdb
    minAvailable: 50%

The outcome is that

Every pod runs in replica 2, as expected

kubectl get pods -n knative-serving
NAME                                                    READY   STATUS      RESTARTS   AGE
autoscaler-hpa-755fcd98f6-rng6x                         1/1     Running     0          5m51s
autoscaler-hpa-755fcd98f6-6b5m6                         1/1     Running     0          5m51s
storage-version-migration-serving-serving-1.7.1-sjpcb   0/1     Completed   0          5m51s
controller-8566d48f88-sbtxs                             1/1     Running     0          5m54s
controller-8566d48f88-2wpsv                             1/1     Running     0          5m54s
domainmapping-webhook-56cf45569d-drtw7                  1/1     Running     0          5m53s
domain-mapping-68c84654d5-vcr54                         1/1     Running     0          5m53s
domain-mapping-68c84654d5-m5b4f                         1/1     Running     0          5m53s
domainmapping-webhook-56cf45569d-l7xm5                  1/1     Running     0          5m53s
webhook-7fd7f67bdc-k8d5b                                1/1     Running     0          5m52s
webhook-7fd7f67bdc-ggxdg                                1/1     Running     0          5m52s
autoscaler-754fb5b97d-lwd95                             1/1     Running     0          5m54s
autoscaler-754fb5b97d-bfwpt                             1/1     Running     0          5m54s
activator-7b9c99b8f6-l6zd8                              1/1     Running     0          5m54s
activator-7b9c99b8f6-grxdm                              1/1     Running     0          5m54s

The HorizontalPodAutoscalers are configured as expected

kubectl get hpa -n knative-serving
NAME        REFERENCE              TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
activator   Deployment/activator   0%/100%    2         21        2          6m47s
webhook     Deployment/webhook     10%/100%   2         6         2          6m45s

The PodDisruptionBudgets are configured as expected (we have allowedDisruptions available i.e. we can drain nodes)

 kubectl get pdb -n knative-serving
NAME            MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
webhook-pdb     50%             N/A               1                     8m19s
activator-pdb   50%             N/A               1                     8m22s

The annotations are effectively overridden for webhook, while are left unaffected for other workloads i.e. activator

❯ kubectl get deploy -n knative-serving webhook -o jsonpath='{.spec.template.metadata.annotations}'
{"cluster-autoscaler.kubernetes.io/safe-to-evict":"true"}%
❯ kubectl get deploy -n knative-serving activator -o jsonpath='{.spec.template.metadata.annotations}'
{"cluster-autoscaler.kubernetes.io/safe-to-evict":"false"}%

[1] Override replicas, labels and annotations
[2] Knative-operator 1.7.1

vadasambar · 2023-05-25T05:27:55Z

Note that cluster-autoscaler has been supporting checking for PDBs when it evicts pods from the node (in the process of draining a node to scale it down) since 1.6.X version of Kubernetes (ref).

Here's how cluster-autoscaler behaves:

check if cluster-autoscaler.kubernetes.io/safe-to-evict is set to true
if it is set to true, check if evicting the pod disrupts PDBs
1. if it does, block draining of the node
2. if it doesn't, continue with draining of the node
if it is not set to true
1. if it is set to false
  1. block draining of the node

Code refs:

File: simulator/drain.go
095: 	pods, daemonSetPods, blockingPod, err = drain.GetPodsForDeletionOnNodeDrain(
096: 		pods,
097: 		pdbs,
098: 		deleteOptions.SkipNodesWithSystemPods,
099: 		deleteOptions.SkipNodesWithLocalStorage,
100: 		deleteOptions.SkipNodesWithCustomControllerPods,
101: 		listers,
102: 		int32(deleteOptions.MinReplicaCount),
103: 		timestamp)
104: 	...
109: 	if pdbBlockingPod, err := checkPdbs(pods, pdbs); err != nil {
110: 		return []*apiv1.Pod{}, []*apiv1.Pod{}, pdbBlockingPod, err
111: 	}

https://github.com/kubernetes/autoscaler/blob/d1740df93b76d99bf9302ad0c62978deb0ec1d5b/cluster-autoscaler/simulator/drain.go#L95-L111

If you check GetPodsForDeletionOnNodeDrain

File: utils/drain/drain.go
110: 		safeToEvict := hasSafeToEvictAnnotation(pod)
...
129: 		if !safeToEvict && !terminal {
...
145: 			if hasNotSafeToEvictAnnotation(pod) {
146: 				return []*apiv1.Pod{}, []*apiv1.Pod{}, &BlockingPod{Pod: pod, Reason: NotSafeToEvictAnnotation}, fmt.Errorf("pod annotated as not safe to evict present: %s", pod.Name)
147: 			}
148: 		}

https://github.com/kubernetes/autoscaler/blob/d1740df93b76d99bf9302ad0c62978deb0ec1d5b/cluster-autoscaler/utils/drain/drain.go#L108-L148

Here's definition for hasSafeToEvictAnnotation and hasNotSafeToEvictAnnotation:

File: utils/drain/drain.go
315: // This checks if pod has PodSafeToEvictKey annotation
316: func hasSafeToEvictAnnotation(pod *apiv1.Pod) bool {
317: 	return pod.GetAnnotations()[PodSafeToEvictKey] == "true"
318: }
319: 
320: // This checks if pod has PodSafeToEvictKey annotation set to false
321: func hasNotSafeToEvictAnnotation(pod *apiv1.Pod) bool {
322: 	return pod.GetAnnotations()[PodSafeToEvictKey] == "false"
323: }

https://github.com/kubernetes/autoscaler/blob/d1740df93b76d99bf9302ad0c62978deb0ec1d5b/cluster-autoscaler/utils/drain/drain.go#L315-L323

Links are based on d1740df93b76d99bf9302ad0c62978deb0ec1d5b which is the latest commit on master at the time of writing this comment.

https://github.com/kubernetes/autoscaler/commits/master

dprotaso · 2023-05-30T19:40:57Z

/assign @dprotaso

I'm going to drop the annotation

paoloyx · 2023-05-31T07:53:24Z

@dprotaso thank you so much.

Just one last question, and that is because I'm not familiar with Knative releasing and backporting to older versions...in order to get this update, should we upgrade our 1.7.1 version of serving to the next minor one (i.e. the 1.11.0 milestone)?

Or is expected to have an (eventual) manual action to backport the "fix" to 1.7 branch by issuing another patch version?

dprotaso · 2023-05-31T18:32:42Z

I'm inclined to backport these changes - but I might wait to include some other fixes.

We only support the last two releases - so we would only backport to 1.9 & 1.10

dprotaso · 2023-05-31T18:33:05Z

I'll leave this issue open to track the backport

dprotaso · 2023-06-06T16:51:42Z

Point releases are out - I'm going to close this out

https://github.com/knative/serving/releases/tag/knative-v1.10.2
https://github.com/knative/serving/releases/tag/knative-v1.9.4

thanks for surfacing this problem @paoloyx

paoloyx · 2023-06-07T10:21:10Z

Thanks to you @dprotaso for your work

paoloyx added the kind/question Further information is requested label May 17, 2023

dprotaso added the triage/needs-user-input Issues which are waiting on a response from the reporter label May 17, 2023

vadasambar mentioned this issue May 25, 2023

May 2023 vadafoss/daily-updates#9

Closed

dprotaso added this to the v1.11.0 milestone May 27, 2023

knative-prow bot assigned dprotaso May 30, 2023

This was referenced May 30, 2023

drop safe to evict annotations #14035

Merged

drop safe-to-evict annotations knative-extensions/net-certmanager#540

Merged

drop safe-to-evict annotations knative-extensions/net-istio#1117

Merged

knative-prow bot closed this as completed in #14035 May 31, 2023

dprotaso reopened this May 31, 2023

dprotaso closed this as completed Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation prevents GKE nodepool to scale down #13984

cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation prevents GKE nodepool to scale down #13984

paoloyx commented May 17, 2023 •

edited by dprotaso

Loading

dprotaso commented May 17, 2023

dprotaso commented May 17, 2023 •

edited

Loading

paoloyx commented May 18, 2023

paoloyx commented May 18, 2023

paoloyx commented May 23, 2023

dprotaso commented May 23, 2023

dprotaso commented May 23, 2023 •

edited

Loading

paoloyx commented May 24, 2023

paoloyx commented May 24, 2023 •

edited

Loading

vadasambar commented May 25, 2023 •

edited

Loading

dprotaso commented May 30, 2023

paoloyx commented May 31, 2023

dprotaso commented May 31, 2023

dprotaso commented May 31, 2023

dprotaso commented Jun 6, 2023 •

edited

Loading

paoloyx commented Jun 7, 2023

cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation prevents GKE nodepool to scale down #13984

cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation prevents GKE nodepool to scale down #13984

Comments

paoloyx commented May 17, 2023 • edited by dprotaso Loading

dprotaso commented May 17, 2023

dprotaso commented May 17, 2023 • edited Loading

paoloyx commented May 18, 2023

A premise about used configuration

Scale down is still prevented

Conclusion

paoloyx commented May 18, 2023

paoloyx commented May 23, 2023

dprotaso commented May 23, 2023

dprotaso commented May 23, 2023 • edited Loading

paoloyx commented May 24, 2023

paoloyx commented May 24, 2023 • edited Loading

vadasambar commented May 25, 2023 • edited Loading

dprotaso commented May 30, 2023

paoloyx commented May 31, 2023

dprotaso commented May 31, 2023

dprotaso commented May 31, 2023

dprotaso commented Jun 6, 2023 • edited Loading

paoloyx commented Jun 7, 2023

paoloyx commented May 17, 2023 •

edited by dprotaso

Loading

dprotaso commented May 17, 2023 •

edited

Loading

dprotaso commented May 23, 2023 •

edited

Loading

paoloyx commented May 24, 2023 •

edited

Loading

vadasambar commented May 25, 2023 •

edited

Loading

dprotaso commented Jun 6, 2023 •

edited

Loading