-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to apply caps only on alive pods #252
Conversation
5ee08a9
to
dcb8b00
Compare
the problem I see is that the errored pods fail for a reason (ie. wrong image), and this would just enter in a infinite loop of spawning new pods |
Well, the option is an opt-in. Perhaps the plugin should optionally also provide a strategy (like, exponential back-off) to delay the launch of new pods? |
@carlossg Should I add a warning in the description of the checkbox that this can lead to an infinite loop of spawning new pods? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks mostly ok, see my comment, but capOnlyOnUnDeadPods
sounds really weird to me, wouldn't be more intuitive capOnlyOnAlivePods
?
@@ -424,6 +436,22 @@ private boolean addProvisionedSlave(@Nonnull PodTemplate template, @CheckForNull | |||
PodList namedList = client.pods().inNamespace(templateNamespace).withLabels(labelsMap).list(); | |||
List<Pod> namedListItems = namedList.getItems(); | |||
|
|||
if (this.isCapOnlyOnUnDeadPods()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of getting the full list, then filtering here, can't it be done using something like withFields("status.phase", "running|pending")
not sure what would be the right syntax but should be possible based on kubernetes/kubernetes#49387
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do
./kubectl get pods --field-selector=status.phase='Succeeded'
I can also do:
./kubectl get pods --field-selector=status.phase='Failed'
But I can't do:
./kubectl get pods --field-selector=status.phase='Succeeded|Failed'
Neither can I do:
./kubectl get pods --field-selector='status.phase in (Succeeded, Failed)'
I couldn't find any official documentation on fieldselector 😢 https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#
I also looked at the implementation of withFields and getFieldQueryParam and it doesn't seem like a selector has been implemented yet. 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I can do:
./kubectl get pods --field-selector='status.phase!=Failed,status.phase!=Succeeded,status.phase!=Unknown'
But again, we really need a function withFieldSelector()
at https://github.com/fabric8io/kubernetes-client/blob/v3.1.0/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/Filterable.java
dcb8b00
to
020eaf6
Compare
A pod can be in one of five phases: running, pending, succeeded, failed or unknown. The pods which are not in the running/pending phase do not consume any CPU/Memory resources, they just need to be GC'ed. If, in the scenario when a cap has been specified on a k8s cloud (single namespace) and let's say 40% of the count are in the 'failed' phase, and 60% are 'runnning/pending', newer pods will not be spawned in that namespace, since the plugin will count all those pods for instance/pod cap and even though it could have spawned 40% more pods, it won't and jobs will starve. This patch adds an option to calculate the cap for pods only in the 'running' or 'pending' phase, on a cloud level and on a pod template level, so that the cap applies to only those pods which are alive or about to be. Change-Id: Id77e837aa9a42742618cd3c543e7b99f9d38a10a
020eaf6
to
a615156
Compare
I've updated the PR with a less weird name 😉 |
Finally, after 5 months! 🎉 |
sorry, thanks! |
A pod can be in one of five phases: running, pending, succeeded, failed
or unknown. The pods which are not in the running/pending state do not
consume any CPU/Memory resources, they just need to be GC'ed. If, in the
scenario when a cap has been specified on a k8s cloud (single namespace)
and let's say 40% of the count are in the 'failed' phase, and 60% are
'runnning/pending', newer pods will not be spawned in that namespace,
since the plugin will count all those pods for instance/pod cap and even
though it could have spawned 40% more pods, it won't and jobs will
starve. This patch adds an option to calculate the cap for pods only in
the 'running' or 'pending' phase, on a cloud level and on a pod template
level.