Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BookCapacity for ProvisioningRequest pods #6880

Merged

Conversation

yaroslava-serdiuk
Copy link
Contributor

@yaroslava-serdiuk yaroslava-serdiuk commented May 29, 2024

What type of PR is this?

/kind feature

This is needed to prevent ScaleDown for ProvisioningRequest during booking time.
Fixes #6517
Complete implementation for #6815

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 29, 2024
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 7, 2024
@yaroslava-serdiuk yaroslava-serdiuk changed the title WIP: BookCapacity for ProvisioningRequest pods BookCapacity for ProvisioningRequest pods Jun 12, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 12, 2024
@yaroslava-serdiuk yaroslava-serdiuk force-pushed the provreq-scale-down branch 3 times, most recently from 7ac217c to 09ba963 Compare June 12, 2024 14:15
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 13, 2024
@kisieland
Copy link
Contributor

/lgtm

@k8s-ci-robot
Copy link
Contributor

@kisieland: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.


// BookCapacity schedule fake pods for ProvisioningRequest that should have reserved capacity
// in the cluster.
func (p *provReqProcessor) BookCapacity(ctx *context.AutoscalingContext) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be implemented as PodListProcessor.Process() rather than introducing a new call to StaticAutoscaler:

  • It's literally doing what PLP is meant to do - changing the list of pods to be processed by CA (by injecting new ones).
  • Scheduling pods on existing nodes is also generally done in PLP - FilterOutSchedulable is the main place we do it.
  • Injecting pods in PLP in a similar way is a pretty well established pattern in CA forks - I know you have access to GKE fork, you can see how CapacityRequests do pretty much the same thing in a PLP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PodListProcessor is processing the list of unschedulable pods and applying changes to it. Since booking capacity does nothing to unschedulable pods list and just modifying cluster snapshot.
Sure, I can implement booking capacity as a part of PodListProcessor and just do nothing with unschedulable list, is it what you asking? However I don't see the advantage of this approach.

)

// CapacityReservation is interface to reserve capacity in the cluster.
type CapacityReservation interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment - this is already generally done by PodListProcessors and I think PLP are better suited to the job: in most use-cases where you book capacity you also want to add any pods that don't fit to the list of pending pods in order to trigger scale-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in most use-cases where you book capacity you also want to add any pods that don't fit to the list of pending pods in order to trigger scale-up

I explicitly don't want to add any pods that don't fit to the list of pending pods, because the scale-up for ProvReq was already triggered, we just reserve capacity in the simple way by creating fake pods for ProvReq. In fact real pods could be already created, so fake pods won't fit in the cluster and this is fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, what I mean is that injecting in-memory pods is a common pattern already. This includes both pods that end up in the list of unschedulable pods and pods added directly to snapshot.
In most cases when you inject pods you expect whatever fits in snapshot to go there and what is left to trigger scale-up. Your use-case only involves modifying the snapshot and not the list of pending pods, which makes it slightly unusual. But I think it's better to still do it in PLP, even if it doesn't modify the lists of pods:

  • It is consistent with other implementations
  • What would be the expectation for future pod-injecting features that follow the pattern of "scheduling" as much as possible in cluster snapshot and adding leftover to list of unschedulable pods? Should those be implemented as PLP or the new processor? They both modify the list of pods and book capacity - and arguably so does FilterOutSchedulable which is a well established part of core CA logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented PLP

@yaroslava-serdiuk
Copy link
Contributor Author

/pony

@k8s-ci-robot
Copy link
Contributor

@yaroslava-serdiuk: pony image

In response to this:

/pony

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@MaciekPytel
Copy link
Contributor

/lgtm
/approve
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 12, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 12, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MaciekPytel, yaroslava-serdiuk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 12, 2024
@yaroslava-serdiuk
Copy link
Contributor Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 12, 2024
@k8s-ci-robot k8s-ci-robot merged commit 68a757c into kubernetes:master Jul 12, 2024
6 checks passed
@yaroslava-serdiuk
Copy link
Contributor Author

/cherry-pick cluster-autoscaler-release-1.30

@k8s-infra-cherrypick-robot

@yaroslava-serdiuk: new pull request created: #7057

In response to this:

/cherry-pick cluster-autoscaler-release-1.30

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ProvisioningRequest booked capacity is scaled down (and shouldn't be)
5 participants