Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address issue with AWS instance type schema #2787

Merged
merged 10 commits into from
Oct 29, 2024
Merged

Conversation

viniciusdc
Copy link
Contributor

@viniciusdc viniciusdc commented Oct 22, 2024

Reference Issues or PRs

closes #2782

What does this implement/fix?

Put a x in the boxes that apply

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds a feature)
  • Breaking change (fix or feature that would cause existing features not to work as expected)
  • Documentation Update
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Build related changes
  • Other (please describe):

Testing

  • Did you test the pull request locally?
  • Did you add new tests?

How to test this PR?

As the previous error happened only during terraform plan you will need to mock the deployment of nebari, I suggest mofiying the source code to only plan the code instead of deploying (this should work for the second stage, since there are no pre-variable depency AFAIK) -- stage 1 can be skipped by using the state local instead of remote.

  • Init an AWS deployment, e.g, nebari init aws --project testing-gpu
  • add a GPU node block in your config, I have an example in the linked issue;
  • with the changes mentioned above, trigger nebari deploy so that it only plans the stage,
  • You need to validate that the actual instance type is respected when the GPU block is passed, i.e, if the node group does not have gpu: enabled you should see AL2_X86_64 if gpu is passed, then its variant should show up ``AL2_X86_64_GPU`
  • As a final validation, make sure to check the node_template block and test if by passing an AMI the intance_type also changes to CUSTOM.

Any other comments?

I made a deployment on AWS from this branch and GPUs were working as expected

@viniciusdc viniciusdc added the needs: review 👀 This PR is complete and ready for reviewing label Oct 23, 2024
@viniciusdc viniciusdc added this to the 2024.9.2 milestone Oct 24, 2024
@dcmcand
Copy link
Contributor

dcmcand commented Oct 24, 2024

@viniciusdc can you fill out the how to test section here?

@viniciusdc
Copy link
Contributor Author

viniciusdc commented Oct 24, 2024

done. Thanks for catching it!

@viniciusdc
Copy link
Contributor Author

@dcmcand This will address the issue you saw. Could you have another look? (Let me know if you see any other pydantic errors; they might show up again.)

Copy link
Contributor

@dcmcand dcmcand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpu: false returns AL2_x86_64
gpu: true returns AL2_X86_64_GPU
launch template with ami_id specified returns CUSTOM whether gpu is true or false

@dcmcand dcmcand added status: approved 💪🏾 This PR has been reviewed and approved for merge and removed needs: review 👀 This PR is complete and ready for reviewing labels Oct 28, 2024
@viniciusdc viniciusdc merged commit c384b06 into main Oct 29, 2024
28 checks passed
@viniciusdc viniciusdc deleted the fix-node-group-ami-type branch October 29, 2024 13:51
viniciusdc added a commit that referenced this pull request Oct 30, 2024
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: approved 💪🏾 This PR has been reviewed and approved for merge
Projects
Status: Done 💪🏾
Development

Successfully merging this pull request may close these issues.

[BUG] - AWS instance type not properly respected when gpu are enabled
2 participants