-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: GPUs cannot be used with Tainted Nodes #633
Comments
As it is, tolerations added manually to a Notebook definition are "erased" when you start the Notebook through the UI. This is incompatible with multiple pools of GPUs that you have to taint to properly schedule specific workloads, or just with standard management of workloads placement through taints.
|
So today we have a toleration for I would imagine this is handled already to some degree 🤔 In the work for accelerators, I imagine this will be more flexible and seamless for what is being tainted/tolerated... effectively this should be done, no?
|
This is no longer valid due to Habana work - moving to closed after talking to Gage. Please reopen with questions if there are any misunderstandings. |
Is there an existing issue for this?
Current Behavior
From the downstream issue https://issues.redhat.com/browse/RHODS-4769
Expected Behavior
GPUs should work for both using tolerations & not using tolerations.
Steps To Reproduce
See the current behaviour / downstream ticket.
This is likely related to the Prometheus query omitting results -- but that is just a guess. Might be impacted by #573 and supporting of scaling (which could change the "show the dropdown" logic -- may not work with unscalable nodes that have tolerations)
Workaround (if any)
No response
OpenShift Infrastructure Version
No response
Openshift Version
No response
What browsers are you seeing the problem on?
No response
Open Data Hub Version
2.3.0
Relevant log output
No response
The text was updated successfully, but these errors were encountered: