Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid queueing workloads that don't match CQ namespaceSelector #322

Merged
merged 1 commit into from
Aug 12, 2022

Conversation

ahg-g
Copy link
Contributor

@ahg-g ahg-g commented Aug 10, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR updates the scheduler to avoid re-queueing workloads that don't match the cq NamespaceSelector.

The initial idea was to not add those workloads when first observed, but this is not enough to address this issue since a namespace label or CQ namespaceSelector could change by the time the workload that was initially accepted into the queue. Those changes could make the workload inadmissible due to not matching namespaceSelector, and so we still need a code path that handles the case during re-queueing.

The consequence is that such workloads will get evaluated, but at most once. To optimize away this wasted cycle, we will need to avoid adding the workload from the beginning, but this can be done as a followup because this PR is too long (I already made the changes to the workload controller to update the workload status for this case on another branch).

Which issue(s) this PR fixes:

Fixes #301

Special notes for your reviewer:

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 10, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from kerthcet August 10, 2022 20:16
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 10, 2022
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 10, 2022
@ahg-g ahg-g force-pushed the ahg-ns4 branch 2 times, most recently from 071a456 to 057724b Compare August 10, 2022 20:58
@ahg-g
Copy link
Contributor Author

ahg-g commented Aug 10, 2022

/assign @alculquicondor

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if it's worth simply not adding a Workload to the queue system at all if the namespace doesn't match.
Then, when there is a namespace update, we could use the goclient to list all the workloads in the informer's cache.
Then these workloads wouldn't show up in the "pending" metric. Although that might be undesired?

return c.pushIfNotPresent(wInfo)
// QueueInadmissibleWorkloads moves all workloads from inadmissibleWorkloads to heap.
// If at least one workload is moved, returns true. Otherwise returns false.
func (c *ClusterQueueImpl) QueueInadmissibleWorkloads(client client.Client) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass a context.

Although... (could be in a follow up), we already know which namespace was updated in cqNamespaceHandler. So we could pass the name to this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not the only place where this is called though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can pass metav1.NamespaceAll when the namespace is not important (it's the empty string)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will make the change in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started to do it, but halfway through I felt the change not really worth it, at least not now.

Unless we use an options pattern with a default of metav1.NamespaceAll, I am afraid we may make a mistake somewhere calling this function incorrectly while restricting it to a namespace, also the only place that it currently makes sense to use it is when a namespace changes its labels, which is rather infrequent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't like that we call into the client so much. Although it's cached. But sure, it can be a follow up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this optimization wouldn't save us much because in the vast majority of the cases we are calling the function with NamespaceAll.

@alculquicondor
Copy link
Contributor

/hold
to prevent premature merge

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 11, 2022
@ahg-g
Copy link
Contributor Author

ahg-g commented Aug 11, 2022

I'm thinking if it's worth simply not adding a Workload to the queue system at all if the namespace doesn't match. Then, when there is a namespace update, we could use the goclient to list all the workloads in the informer's cache. Then these workloads wouldn't show up in the "pending" metric. Although that might be undesired?

I thought about this approach, but my conclusion was that it might be better to unify how we deal with inadmissible workloads. While the PR is large, most of it is a refactor that is agnostic to this specific issue, and I think can help us long term when deciding to be selective on re-queueing.

We can still do the optimization related to not adding the workload on add, but I think we still need to track those workloads and report them via a metric somewhere. May be we can change inadmissibleWorkloads into a map to to breakdown by requeue reason, and use that in the pending metric as well.

return c.pushIfNotPresent(wInfo)
// QueueInadmissibleWorkloads moves all workloads from inadmissibleWorkloads to heap.
// If at least one workload is moved, returns true. Otherwise returns false.
func (c *ClusterQueueImpl) QueueInadmissibleWorkloads(client client.Client) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can pass metav1.NamespaceAll when the namespace is not important (it's the empty string)

@ahg-g
Copy link
Contributor Author

ahg-g commented Aug 12, 2022

All comments should now be addressed now.

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
with a nit

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 12, 2022
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 12, 2022
@ahg-g
Copy link
Contributor Author

ahg-g commented Aug 12, 2022

/lgtm with a nit

thanks, fixed and squashed.

@alculquicondor
Copy link
Contributor

/lgtm
/hold cancel

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Aug 12, 2022
@k8s-ci-robot k8s-ci-robot merged commit e571d42 into kubernetes-sigs:main Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid queueing workloads that don't match CQ namespaceSelector
4 participants