-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(eks): add neuron device plugin for Inferentia managed node groups #27427
feat(eks): add neuron device plugin for Inferentia managed node groups #27427
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pull request linter has failed. See the aws-cdk-automation comment below for failure reasons. If you believe this pull request should receive an exemption, please comment and provide a justification.
A comment requesting an exemption should contain the text Exemption Request
. Additionally, if clarification is needed add Clarification Request
to a comment.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what is going on here re: addNeuronDevicePlugin
, and there's not really any additional information for me to figure it out. Is there a linked issue that can provide more background? Can you state more clearly what the intention is of this PR in the PR description? Lastly, what we definitely will need is at least 1 unit test
You can find a description of what the Neuron device plugin is, in here: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-setup.html (tap on Deploy Neuron...): The change is replicating the same logic present in addAutoScalingGroupCapacity, see here:
Note that e.g. eksctl takes care of that, see here https://docs.aws.amazon.com/eks/latest/userguide/inferentia-support.html: The CDK needs to implement the same behaviour, otherwise managed node groups with Inferentia and Trainium devices will not be fully supported. |
@freschri I put the |
This PR has been in the CHANGES REQUESTED state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week. |
This PR has been deemed to be abandoned, and will be automatically closed. Please create a new PR for these changes if you think this decision has been made in error. |
The pull request linter fails with the following errors:
PRs must pass status checks before we can provide a meaningful review. If you would like to request an exemption from the status checks or clarification on feedback, please leave a comment on this PR containing |
In an EKS cluster with Inferentia instances, Neuron device plugin is only installed in case of autoscaling groups.
The change expands the logic to managed node groups.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license