You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a way to limit the time Karpenter keeps the nodes that aren't initialized?
And add logs for visibility to understand why these nodes aren't removed?
In my case, because of my miss configuration the nodes were up for over 24 hours, this configuration issue can be pretty costly if Karpenter keeps these nodes forever 😅
We ran into a similar issue where we blew the bill on a large gpu node where device plugin got stuck for multiple days for whatever reason.
While I understand the reasoning to not remove to be able to debug the root cause, but a choice on how long would be nice?
The text was updated successfully, but these errors were encountered:
chiragjn
changed the title
[Feature Request] Limit the time Karpenter keeps the gpu nodes that aren't initialized
[Feature Request] Limit the time Karpenter keeps the nodes that aren't initialized
May 1, 2024
Originally posted by @itaibenyishai in #1914 (comment)
We ran into a similar issue where we blew the bill on a large gpu node where device plugin got stuck for multiple days for whatever reason.
While I understand the reasoning to not remove to be able to debug the root cause, but a choice on how long would be nice?
The text was updated successfully, but these errors were encountered: