-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast admission annotation #3189
Conversation
Skipping CI for Draft Pull Request. |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
d098d50
to
1a08859
Compare
/ok-to-test |
1a08859
to
c7e7447
Compare
/test pull-kueue-test-integration-main |
/cc @trasc |
c7e7447
to
fcf0084
Compare
* Simplify constructGroupPodSetsFast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: 6dd114a245a01672959984db22bb09f7a4c1d280
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/hold
in case @tenzen-y has some extra comments
LGTM label has been added. Git tree hash: 7ef7fa94527d727a61d9dcb6990f6c1b1d8c25b9
|
@mimowo The use case sounds reasonable, but shouldn't we implement the StatefulSet integration as the dedicated Job? |
Yes, we are considering it, but it seems to be significantly more work. The idea is to provide the MVP functionality so that Kueue can be used for Inference in 0.9. I consider this as "experimental" support to gain user feedback. I would go for a dedicated integration when we hit some use cases which cannot be achieved this way. Maybe it is needed for efficient scaling, but I'm not sure. Maybe it is achievable by scaling support for PodGroups also. |
That makes sense. In that case, we may would like to avoid using "StatefulSet" as an integration name and use alternative name something like "StatefulSetWithPodGroup" since we are planning to implement the dedicated integration for the StetefulSet and we can assume the "StatefulSetWithPodGroup" will be deprecated in the future release. Anyway, this PR can be moved forward. |
@vladikkuzn Could you open a PR to update the PodGroup documentation? https://kueue.sigs.k8s.io/docs/tasks/run/plain_pods/#running-a-group-of-pods-to-be-admitted-together |
Oh, sorry. I accidentally closed this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/hold cancel
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mimowo, tenzen-y, vladikkuzn The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
TBH I would prefer to use "StatefulSet" and handle the underlying mechanism as an implementation detail. Users creating StatefulSets don't need to be aware of that. The "fast-admission" annotation will be added by a webhook. One particularly challenging aspect is that StatefulSet as such does not support "suspend" operation, similarly as Deployment. Adding such fields to the upstream k8s will be challenging, because the usage is rather niche for now. So, we will likely stay with PodGroups as the underlying integration mechanism for a while. |
I discussed this with @mimowo offline. Throughout valuable discussions, we found that the framework integration migration potentially could happen in any integration, not only for StatefulSet. So, we decided to use the "StatefulSet" as a integration name as a today's StatefulSet integration. Then, we will seek out how we can support migration to new integration. |
It seems that this is a user facing new feature since this can be used by batch users. /release-note-edit
|
@tenzen-y are you sure about this release note^? "kueue.x-k8s.io/pod-group-fast-admission" annotation was added, not "kueue.x-k8s.io/retriable-in-group" |
Oops, sorry. You're right. /release-note-edit
|
* Fast admission annotation * Fast admission annotation * Simplify constructGroupPodSetsFast * Fast admission annotation * Restructure integration test
What type of PR is this?
/kind feature
What this PR does / why we need it:
The PodGroup integration waits until all pods are created, but StatefulSet creates second pod only after all previous pods are running, so there was a deadlock. The solution to this is to introduce the kueue.x-k8s.io/fast-pod-group-admission annotation, when true, then the Pod group does not need to wait for all pods with ungating. Part of #2717
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?