Avoid queueing workloads that don't match CQ namespaceSelector #322

ahg-g · 2022-08-10T20:15:47Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR updates the scheduler to avoid re-queueing workloads that don't match the cq NamespaceSelector.

The initial idea was to not add those workloads when first observed, but this is not enough to address this issue since a namespace label or CQ namespaceSelector could change by the time the workload that was initially accepted into the queue. Those changes could make the workload inadmissible due to not matching namespaceSelector, and so we still need a code path that handles the case during re-queueing.

The consequence is that such workloads will get evaluated, but at most once. To optimize away this wasted cycle, we will need to avoid adding the workload from the beginning, but this can be done as a followup because this PR is too long (I already made the changes to the workload controller to update the workload status for this case on another branch).

Which issue(s) this PR fixes:

Fixes #301

Special notes for your reviewer:

k8s-ci-robot · 2022-08-10T20:16:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahg-g]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ahg-g · 2022-08-10T23:18:14Z

/assign @alculquicondor

pkg/queue/cluster_queue_impl.go

alculquicondor

I'm thinking if it's worth simply not adding a Workload to the queue system at all if the namespace doesn't match.
Then, when there is a namespace update, we could use the goclient to list all the workloads in the informer's cache.
Then these workloads wouldn't show up in the "pending" metric. Although that might be undesired?

pkg/queue/cluster_queue_impl.go

pkg/scheduler/scheduler.go

pkg/queue/cluster_queue_best_effort_fifo.go

alculquicondor · 2022-08-11T18:59:38Z

pkg/queue/cluster_queue_impl.go

-	return c.pushIfNotPresent(wInfo)
+// QueueInadmissibleWorkloads moves all workloads from inadmissibleWorkloads to heap.
+// If at least one workload is moved, returns true. Otherwise returns false.
+func (c *ClusterQueueImpl) QueueInadmissibleWorkloads(client client.Client) bool {


pass a context.

Although... (could be in a follow up), we already know which namespace was updated in cqNamespaceHandler. So we could pass the name to this function.

not the only place where this is called though.

you can pass metav1.NamespaceAll when the namespace is not important (it's the empty string)

ok, I will make the change in this PR

Started to do it, but halfway through I felt the change not really worth it, at least not now.

Unless we use an options pattern with a default of metav1.NamespaceAll, I am afraid we may make a mistake somewhere calling this function incorrectly while restricting it to a namespace, also the only place that it currently makes sense to use it is when a namespace changes its labels, which is rather infrequent.

I just don't like that we call into the client so much. Although it's cached. But sure, it can be a follow up.

Yes, but this optimization wouldn't save us much because in the vast majority of the cases we are calling the function with NamespaceAll.

alculquicondor · 2022-08-11T19:11:15Z

/hold
to prevent premature merge

ahg-g · 2022-08-11T19:22:43Z

I'm thinking if it's worth simply not adding a Workload to the queue system at all if the namespace doesn't match. Then, when there is a namespace update, we could use the goclient to list all the workloads in the informer's cache. Then these workloads wouldn't show up in the "pending" metric. Although that might be undesired?

I thought about this approach, but my conclusion was that it might be better to unify how we deal with inadmissible workloads. While the PR is large, most of it is a refactor that is agnostic to this specific issue, and I think can help us long term when deciding to be selective on re-queueing.

We can still do the optimization related to not adding the workload on add, but I think we still need to track those workloads and report them via a metric somewhere. May be we can change inadmissibleWorkloads into a map to to breakdown by requeue reason, and use that in the pending metric as well.

pkg/scheduler/scheduler.go

pkg/queue/cluster_queue_impl.go

pkg/scheduler/scheduler.go

pkg/queue/cluster_queue_interface.go

pkg/queue/cluster_queue_best_effort_fifo_test.go

alculquicondor · 2022-08-12T14:27:26Z

pkg/queue/cluster_queue_impl.go

-	return c.pushIfNotPresent(wInfo)
+// QueueInadmissibleWorkloads moves all workloads from inadmissibleWorkloads to heap.
+// If at least one workload is moved, returns true. Otherwise returns false.
+func (c *ClusterQueueImpl) QueueInadmissibleWorkloads(client client.Client) bool {


you can pass metav1.NamespaceAll when the namespace is not important (it's the empty string)

ahg-g · 2022-08-12T17:34:53Z

All comments should now be addressed now.

alculquicondor

/lgtm
with a nit

pkg/controller/core/resourceflavor_controller.go

ahg-g · 2022-08-12T18:36:47Z

/lgtm with a nit

thanks, fixed and squashed.

alculquicondor · 2022-08-12T18:47:44Z

/lgtm
/hold cancel

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 10, 2022

k8s-ci-robot requested a review from ArangoGutierrez August 10, 2022 20:15

k8s-ci-robot requested a review from kerthcet August 10, 2022 20:16

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 10, 2022

ahg-g force-pushed the ahg-ns4 branch from f21ba6f to 687e6d7 Compare August 10, 2022 20:49

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 10, 2022

ahg-g force-pushed the ahg-ns4 branch 2 times, most recently from 071a456 to 057724b Compare August 10, 2022 20:58

k8s-ci-robot assigned alculquicondor Aug 10, 2022

ahg-g force-pushed the ahg-ns4 branch from 057724b to 909fd19 Compare August 11, 2022 06:19

kerthcet reviewed Aug 11, 2022

View reviewed changes

pkg/queue/cluster_queue_impl.go Show resolved Hide resolved

pkg/queue/cluster_queue_impl.go Show resolved Hide resolved

pkg/queue/cluster_queue_impl.go Show resolved Hide resolved

kerthcet reviewed Aug 11, 2022

View reviewed changes

pkg/queue/cluster_queue_impl.go Outdated Show resolved Hide resolved

kerthcet reviewed Aug 11, 2022

View reviewed changes

pkg/queue/cluster_queue_impl.go Outdated Show resolved Hide resolved

alculquicondor reviewed Aug 11, 2022

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 11, 2022

ahg-g force-pushed the ahg-ns4 branch from 7d5bdf3 to 47b2e8c Compare August 11, 2022 21:11

kerthcet reviewed Aug 12, 2022

View reviewed changes

alculquicondor reviewed Aug 12, 2022

View reviewed changes

pkg/controller/core/resourceflavor_controller.go Outdated Show resolved Hide resolved

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 12, 2022

Avoid queueing workloads that don't match CQ namespaceSelector

84cd6d7

ahg-g force-pushed the ahg-ns4 branch from 6e9e92a to 84cd6d7 Compare August 12, 2022 18:36

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 12, 2022

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Aug 12, 2022

k8s-ci-robot merged commit e571d42 into kubernetes-sigs:main Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid queueing workloads that don't match CQ namespaceSelector #322

Avoid queueing workloads that don't match CQ namespaceSelector #322

ahg-g commented Aug 10, 2022 •

edited

Loading

k8s-ci-robot commented Aug 10, 2022

ahg-g commented Aug 10, 2022

alculquicondor left a comment

alculquicondor Aug 11, 2022

ahg-g Aug 11, 2022

alculquicondor Aug 12, 2022

ahg-g Aug 12, 2022

ahg-g Aug 12, 2022

alculquicondor Aug 12, 2022

ahg-g Aug 12, 2022

alculquicondor commented Aug 11, 2022

ahg-g commented Aug 11, 2022

alculquicondor Aug 12, 2022

ahg-g commented Aug 12, 2022

alculquicondor left a comment

ahg-g commented Aug 12, 2022

alculquicondor commented Aug 12, 2022

Avoid queueing workloads that don't match CQ namespaceSelector #322

Avoid queueing workloads that don't match CQ namespaceSelector #322

Conversation

ahg-g commented Aug 10, 2022 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

k8s-ci-robot commented Aug 10, 2022

ahg-g commented Aug 10, 2022

alculquicondor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor commented Aug 11, 2022

ahg-g commented Aug 11, 2022

Choose a reason for hiding this comment

ahg-g commented Aug 12, 2022

alculquicondor left a comment

Choose a reason for hiding this comment

ahg-g commented Aug 12, 2022

alculquicondor commented Aug 12, 2022

ahg-g commented Aug 10, 2022 •

edited

Loading