[BUG] - AWS EKS user and worker node groups fail to scale to 0 nodes #1811

Adam-D-Lewis · 2023-05-17T21:02:53Z

Description

AWS EKS user and worker node groups fail to scale to 0 nodes

Adam-D-Lewis · 2023-05-17T21:04:57Z

I think the cluster autoscaler won't scale down if a pod from a deployment is scheduled on it. AWS EKS includes a few add-ons by default (specifically the coredns and ebs-csi-controller deployments). Pods from those deployments were scheduled on the worker and user pods. This may be part of the reason the worker and user node groups didn't scale down. I saw a similar issue with kbatch and user-scheduler in jupyterhub but those can be disabled if needed.

You can't disable these default add-ons as far as I can tell. You can patch the deployment after they are deployed which may be a solution.

As a workaround, adding the following to the problematic deployment manifests can resolve the problem after deployment.

    nodeAffinity:    
      requiredDuringSchedulingIgnoredDuringExecution:    
        nodeSelectorTerms:    
        - matchExpressions:    
          - key: eks.amazonaws.com/nodegroup    
            operator: In    
            values:    
            - general

Adam-D-Lewis · 2023-05-17T22:15:16Z

We could patch the problematic add-on deployments in the nebari/template/stages/03-kubernetes-initialize module

iameskild · 2023-05-22T16:14:28Z

@Adam-D-Lewis is this a priority for you? Should we assign someone to work on this?

Adam-D-Lewis · 2023-05-22T16:17:23Z

Not a priority because of the workaround

viniciusdc · 2024-03-23T06:06:28Z

Hi @Adam-D-Lewis, I assume #2353 fixed this issue right? can we mark this as completed?

Adam-D-Lewis added type: bug 🐛 Something isn't working needs: triage 🚦 Someone needs to have a look at this issue and triage labels May 17, 2023

github-project-automation bot moved this to New 📬 in 🪴 Nebari Project Management May 17, 2023

github-project-automation bot added this to 🪴 Nebari Project Management May 17, 2023

pavithraes added provider: AWS needs: PR 📬 This item has been scoped and needs to be worked on and removed needs: triage 🚦 Someone needs to have a look at this issue and triage labels May 22, 2023

sblair-metrostar mentioned this issue Jan 19, 2024

1811 adding missing general node selectors/affinity to core components #2214

Closed

10 tasks

Adam-D-Lewis mentioned this issue Mar 20, 2024

Set node affinity for more pods to ensure they run on general node pool #2353

Merged

10 tasks

viniciusdc added this to the 2024.3.3 milestone Mar 23, 2024

Adam-D-Lewis closed this as completed Mar 23, 2024

github-project-automation bot moved this from New 🚦 to Done 💪🏾 in 🪴 Nebari Project Management Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] - AWS EKS user and worker node groups fail to scale to 0 nodes #1811

[BUG] - AWS EKS user and worker node groups fail to scale to 0 nodes #1811

Adam-D-Lewis commented May 17, 2023 •

edited

Loading

Adam-D-Lewis commented May 17, 2023 •

edited

Loading

Adam-D-Lewis commented May 17, 2023 •

edited

Loading

iameskild commented May 22, 2023

Adam-D-Lewis commented May 22, 2023

viniciusdc commented Mar 23, 2024

[BUG] - AWS EKS user and worker node groups fail to scale to 0 nodes #1811

[BUG] - AWS EKS user and worker node groups fail to scale to 0 nodes #1811

Comments

Adam-D-Lewis commented May 17, 2023 • edited Loading

Description

Adam-D-Lewis commented May 17, 2023 • edited Loading

Adam-D-Lewis commented May 17, 2023 • edited Loading

iameskild commented May 22, 2023

Adam-D-Lewis commented May 22, 2023

viniciusdc commented Mar 23, 2024

Adam-D-Lewis commented May 17, 2023 •

edited

Loading

Adam-D-Lewis commented May 17, 2023 •

edited

Loading

Adam-D-Lewis commented May 17, 2023 •

edited

Loading