-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nvidia GPU Operator fixes to get it working properly #27
Conversation
@compendius thanks a lot for your contribution 🎉
I'm also curious of what was not working on jupyterhub-gpu example, because it has been successfully tested on quiet a few platform. Thanks ! |
@remche thanks for considering the PR .I will try and rebase etc. In the meantime the reason I had to alter the GPU Operator is that when deployed despite looking like it is running ok it does not work properly. For example the recommended vectoradd test fails https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html#cuda-vectoradd
But when you add the custom containerd runtime config and extra environmental variables it works -
Similar to this approach https://thenewstack.io/install-a-nvidia-gpu-operator-on-rke2-kubernetes-cluster/ |
Just checked again and you're right that jupyter-gpu does not work out of the box, we were using air-gapped image 🤣 Could you please update the jupyterhub-gpu example at the same time 😬 thanks again ! |
I have attempted to rebase......and pushed changes . Need to clean history up next |
Thanks for rebasing/cleaning. Could you please Few thing left :
Thanks again for making this module better ! |
thanks a lot !! |
no problem. Glad to be of help. We are using this GPU stack for production workloads so wanted to share how we did it |
just realised the source needs updating away from my fork in my Nvidia gpu example main.tf. Do you want another PR? |
Added