Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create PodTemplate before ProvisioningRequest #4086

Conversation

mbobrovskyi
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

Create PodTemplate before ProvisioningRequest.

Which issue(s) this PR fixes:

Fixes #3957

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix a bug that occurs when a PodTemplate has not been created yet, but the Cluster Autoscaler attempts to process the ProvisioningRequest and marks it as failed.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Jan 29, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 29, 2025
Copy link

netlify bot commented Jan 29, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit cba64f4
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/67a0e6a925e45c000823b508

@mbobrovskyi mbobrovskyi force-pushed the fix/create-pod-template-before-provision-request branch from f66cc05 to 3aa0a49 Compare January 29, 2025 13:32
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 29, 2025
@mbobrovskyi mbobrovskyi force-pushed the fix/create-pod-template-before-provision-request branch from 3aa0a49 to 0c0efc0 Compare January 29, 2025 13:44
@mbobrovskyi mbobrovskyi marked this pull request as ready for review January 29, 2025 13:44
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 29, 2025
@mbobrovskyi
Copy link
Contributor Author

/cc @PBundyra

@k8s-ci-robot k8s-ci-robot requested a review from PBundyra January 29, 2025 13:45
@mbobrovskyi mbobrovskyi force-pushed the fix/create-pod-template-before-provision-request branch from 0c0efc0 to 08ad38a Compare January 29, 2025 13:46
Comment on lines 393 to 429
if err != nil {
// it's a not found, so create it
newPt := &corev1.PodTemplate{
ObjectMeta: metav1.ObjectMeta{
Name: ptKey.Name,
Namespace: ptKey.Namespace,
Labels: map[string]string{
constants.ManagedByKueueLabel: "true",
},
},
Template: ps.Template,
}

// apply the admission node selectors to the Template
psi, err := podset.FromAssignment(ctx, c.client, psaMap[psName], reqPS.Count)
if err != nil {
return err
}

err = podset.Merge(&newPt.Template.ObjectMeta, &newPt.Template.Spec, psi)
if err != nil {
return err
}

// copy limits to requests if needed
workload.UseLimitsAsMissingRequestsInPod(&newPt.Template.Spec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we moved this part to different place, can we name the function accordingly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the function name is correct because we still need to sync controllerReference in PodTemplate. Do you have a different name in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both functions createPodTemplate and UseLimitsAsMissingRequestsInPod are named ok.

I'm not sure these names are best possible, but they don't strike me as misleading.

@mbobrovskyi mbobrovskyi force-pushed the fix/create-pod-template-before-provision-request branch 3 times, most recently from 42002bd to 4b86d34 Compare February 3, 2025 09:13
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, but makes me wonder - what happens if the PodTemplate creation succeeds, but the ProvisioingRequest creation fails. Are we garbage collecting / deleting PodTemplates in some way?

@mbobrovskyi
Copy link
Contributor Author

LGTM overall, but makes me wonder - what happens if the PodTemplate creation succeeds, but the ProvisioingRequest creation fails. Are we garbage collecting / deleting PodTemplates in some way?

That's a good question - I've been thinking about it too. I'm not sure how we can garbage collect it effectively. Maybe we should temporarily set ControllerReference to the Workload before we set it to ProvisionRequest? WDYT?

@mimowo
Copy link
Contributor

mimowo commented Feb 3, 2025

sounds reasonable. if we do so they get deleted when the workload is deleted.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 3, 2025
@mbobrovskyi
Copy link
Contributor Author

sounds reasonable. if we do so they get deleted when the workload is deleted.

Done

@mbobrovskyi mbobrovskyi force-pushed the fix/create-pod-template-before-provision-request branch from 051dd3b to 2098272 Compare February 3, 2025 14:59
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, small comments only to make it more clear why we own the PodTemplate by workload and then transfer the ownership.

@mimowo
Copy link
Contributor

mimowo commented Feb 3, 2025

@mbobrovskyi please squash the commits, I find the manual cherry-picking works better in that case

@mbobrovskyi mbobrovskyi force-pushed the fix/create-pod-template-before-provision-request branch from 1ebd894 to e001823 Compare February 3, 2025 15:49
@mbobrovskyi mbobrovskyi force-pushed the fix/create-pod-template-before-provision-request branch from e001823 to cba64f4 Compare February 3, 2025 15:54
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/cherry-pick release-0.10 release-0.9

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 3, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: fea2566b7e3ad48d2a686b0556af8884834b8243

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mbobrovskyi, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 3, 2025
@k8s-ci-robot k8s-ci-robot merged commit d02a764 into kubernetes-sigs:main Feb 3, 2025
18 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.11 milestone Feb 3, 2025
@mbobrovskyi mbobrovskyi deleted the fix/create-pod-template-before-provision-request branch February 3, 2025 16:38
FillZpp pushed a commit to leptonai/kueue that referenced this pull request Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ProvisioningRequest is created before its PodTemplates, what may cause Cluster Autoscaler to mark it as failed
4 participants