-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comprehensive example task running training workload on GPUs using JobSet #429
Comments
/assign |
@uroy-personal are you still working on these? If not I am going to unassign them so someone else can work on them. |
Hi @danielvegamyhre, Also please help me on what content ( example yaml ) to put there. I hope to finish all the open tasks ( assigned to me ) by this week-end. |
Yes, you can reference some examples in the |
Also note it would be nice in the provisioning step to show example commands for all 3 major cloud providers (AWS, GCP, Azure) |
Thanks. I am working on it. Hope to raise the PR in the next few days. |
@uroy-personal Just following up, are you still working on this? |
Yes @danielvegamyhre , I am on it. I made the changes but found that the above README page removed. Will complete it within this week for sure! Thanks |
Good Morning @danielvegamyhre , |
It seems this issue needs GPU access. Is there a way to get GPU access @danielvegamyhre ? |
/unassign |
@uroy-personal To make this easier, let's not include the steps to provision GPU nodes on each Cloud Provider. Instead, let's just use a generic/placeholder nodeSelector (e.g. |
Thanks @danielvegamyhre , I will have a look and get back at the earliest! |
/assign Currently I have a gpu environment, but the gpu card is not up to date, but I can maybe try it and see. |
/assign |
What would you like to be added:
A comprehensive example showing how to run a training workload on GPUs using JobSet. We could have one example per major cloud provider.
Why is this needed:
We need more concrete examples to reduce friction of user onboarding. Right now we mostly have toy examples with sleep containers to demonstrate functionality of different features.
The text was updated successfully, but these errors were encountered: