Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - AWS EKS user and worker node groups fail to scale to 0 nodes #1811

Closed
Adam-D-Lewis opened this issue May 17, 2023 · 5 comments
Closed
Labels
needs: PR 📬 This item has been scoped and needs to be worked on provider: AWS type: bug 🐛 Something isn't working
Milestone

Comments

@Adam-D-Lewis
Copy link
Member

Adam-D-Lewis commented May 17, 2023

Description

AWS EKS user and worker node groups fail to scale to 0 nodes

@Adam-D-Lewis Adam-D-Lewis added type: bug 🐛 Something isn't working needs: triage 🚦 Someone needs to have a look at this issue and triage labels May 17, 2023
@Adam-D-Lewis
Copy link
Member Author

Adam-D-Lewis commented May 17, 2023

I think the cluster autoscaler won't scale down if a pod from a deployment is scheduled on it. AWS EKS includes a few add-ons by default (specifically the coredns and ebs-csi-controller deployments). Pods from those deployments were scheduled on the worker and user pods. This may be part of the reason the worker and user node groups didn't scale down. I saw a similar issue with kbatch and user-scheduler in jupyterhub but those can be disabled if needed.

You can't disable these default add-ons as far as I can tell. You can patch the deployment after they are deployed which may be a solution.

As a workaround, adding the following to the problematic deployment manifests can resolve the problem after deployment.

    nodeAffinity:    
      requiredDuringSchedulingIgnoredDuringExecution:    
        nodeSelectorTerms:    
        - matchExpressions:    
          - key: eks.amazonaws.com/nodegroup    
            operator: In    
            values:    
            - general    

@Adam-D-Lewis
Copy link
Member Author

Adam-D-Lewis commented May 17, 2023

We could patch the problematic add-on deployments in the nebari/template/stages/03-kubernetes-initialize module

@pavithraes pavithraes added provider: AWS needs: PR 📬 This item has been scoped and needs to be worked on and removed needs: triage 🚦 Someone needs to have a look at this issue and triage labels May 22, 2023
@iameskild
Copy link
Member

@Adam-D-Lewis is this a priority for you? Should we assign someone to work on this?

@Adam-D-Lewis
Copy link
Member Author

Not a priority because of the workaround

@viniciusdc
Copy link
Contributor

Hi @Adam-D-Lewis, I assume #2353 fixed this issue right? can we mark this as completed?

@github-project-automation github-project-automation bot moved this from New 🚦 to Done 💪🏾 in 🪴 Nebari Project Management Mar 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs: PR 📬 This item has been scoped and needs to be worked on provider: AWS type: bug 🐛 Something isn't working
Projects
Development

No branches or pull requests

4 participants