Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PriorityClass in Workload api #104

Merged
merged 2 commits into from
Mar 17, 2022

Conversation

denkensk
Copy link
Member

@denkensk denkensk commented Mar 9, 2022

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

The first part of #82

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 9, 2022
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 9, 2022
@denkensk
Copy link
Member Author

denkensk commented Mar 9, 2022

/assign @ahg-g

@denkensk
Copy link
Member Author

denkensk commented Mar 9, 2022

/test pull-kueue-test-integration-main

1 similar comment
@denkensk
Copy link
Member Author

/test pull-kueue-test-integration-main

api/v1alpha1/queuedworkload_types.go Outdated Show resolved Hide resolved
api/v1alpha1/queuedworkload_types.go Outdated Show resolved Hide resolved
api/v1alpha1/queuedworkload_types.go Outdated Show resolved Hide resolved
func ConstructWorkloadFor(ctx context.Context, client client.Client,
job *batchv1.Job, scheme *runtime.Scheme) (w *kueue.QueuedWorkload, err error) {
var p int32
pcName := job.Spec.Template.Spec.PriorityClassName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be do for workloads that have multiple pod specs?

Should we do this in the queued_workload controller instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rarely see people setting different priorities in a single workload. But in this case, we need a fixed treatment and clear understanding for the user: like choosing the highest one as the priority of the workload?

Copy link
Contributor

@ahg-g ahg-g Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be do for workloads that have multiple pod specs?

It is up to the custom workloads to decide from where to set it (e.g., based on the driver).

Should we do this in the queued_workload controller instead?

how? queuedworkload_controller is not aware of the custom workload CRD.

That makes me think, since order is done at the ClusterQueue level and Queue is simply a pointer to it; what if we have the priority on the Queue, and workloads simply inherit it from the Queue they are submitted to, and so users don't need to set it on the custom workload directly and we avoid those issues altogether.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this in the queued_workload controller instead?

how? queuedworkload_controller is not aware of the custom workload CRD.

I'm thinking that workloads should have a priority independent of the pod priority, although it might be confusing. Also that would mean that users have to set an annotation for it in the Job or something like that. Having it in the Queue is certainly a cleaner option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree that workloads have a separate priority independent of the pod priority. In fact here it already is kubeflow
https://github.com/kubeflow/common/blob/2b40c8f8991e302920ee5536c0ad49dec6724c66/pkg/apis/common/v1/types.go#L208

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Do you know what they do with it? Perhaps insert it in the pod specs?

@denkensk denkensk changed the title Add Priority in Workload api IAdd Priority in Workload api Mar 11, 2022
@denkensk denkensk changed the title IAdd Priority in Workload api wip Add Priority in Workload api Mar 11, 2022
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 11, 2022
@denkensk denkensk changed the title wip Add Priority in Workload api wip Add PriorityClass in Workload api Mar 15, 2022
@denkensk
Copy link
Member Author

/test pull-kueue-test-integration-main

@denkensk denkensk changed the title wip Add PriorityClass in Workload api Add PriorityClass in Workload api Mar 15, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 15, 2022
@denkensk
Copy link
Member Author

/test pull-kueue-test-integration-main

api/v1alpha1/queuedworkload_types.go Outdated Show resolved Hide resolved
api/v1alpha1/queuedworkload_types.go Outdated Show resolved Hide resolved
api/v1alpha1/queuedworkload_types.go Outdated Show resolved Hide resolved
pkg/constants/constants.go Outdated Show resolved Hide resolved
pkg/constants/constants.go Outdated Show resolved Hide resolved
pkg/controller/core/queuedworkload_controller.go Outdated Show resolved Hide resolved
pkg/controller/core/queuedworkload_controller.go Outdated Show resolved Hide resolved
pkg/queue/queue.go Outdated Show resolved Hide resolved
pkg/queue/queue.go Outdated Show resolved Hide resolved
@denkensk
Copy link
Member Author

/test pull-kueue-test-integration-main

@denkensk denkensk requested a review from ahg-g March 15, 2022 16:18
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is populating the PriorityClassName now?

// keywords which indicate the highest priorities with the former being
// the highest priority. Any other name must be defined by creating a
// PriorityClass object with that name. If not specified, the queuedWorkload
// priority will be default or zero if there is no default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this default defined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the constant

DefaultPriority = 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it wouldn't change by external factors, you should just say that the default priority is zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I'm referring to If not specified, the queuedWorkload priority will be default

Is this referring to the default priority class defined for the cluster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's the default priority class defined for the cluster. But if there is no default. It will be 0.
Keep the comments as
https://github.com/kubernetes/kubernetes/blob/ca2cd3b18ef145c34311ba7fd9d389fe8233fae8/pkg/apis/core/types.go#L2902

api/v1alpha1/queuedworkload_types.go Outdated Show resolved Hide resolved
pkg/queue/queue.go Outdated Show resolved Hide resolved
pkg/queue/queue.go Outdated Show resolved Hide resolved
heap.Push(&cq.heap, *info)
return true
}

func (cq *ClusterQueue) PushOrUpdate(w *kueue.QueuedWorkload) {
func (cq *ClusterQueue) PushOrUpdate(w *kueue.QueuedWorkload) bool {
item := cq.heap.items[workload.Key(w)]
info := *workload.NewInfo(w)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not calculate the value in NewInfo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially hoping to calculate in NewInfo. But I find WorkloadInfo is a basic data struct which is invoked in other places like cache. It's hard and unclean if I pass the log/ctx/client to everywhere I want to invoke New.

@denkensk
Copy link
Member Author

/retest

@ahg-g
Copy link
Contributor

ahg-g commented Mar 16, 2022

I am sorry for the back and forth, this plumbing doesn't look "right". I am pretty sure there will be places where we will miss to set info.Priority and so it will be dangerous to continue like this unless the priority look up is done inside NewInfo, which I think isn't going to be pretty because it means NewInfo will need to receive client/ctx/logger, and it is used in multiple places including the cache.

I coming back to having the priority int on the QueuedWorkload and having it populated by the job-controller. Thinking more about how to prevent users from circumventing priority classes: users shouldn't be creating QueuedWorkload instances directly, and so they really can't set the QueuedWorkload priority (at least in our MVP).

Alex, I am pretty sure this is frustrating to you, sorry about that, I am happy to make the change myself if you don't have the time.

pkg/workload/workload.go Outdated Show resolved Hide resolved
@denkensk
Copy link
Member Author

I coming back to having the priority int on the QueuedWorkload and having it populated by the job-controller. Thinking more about how to prevent users from circumventing priority classes: users shouldn't be creating QueuedWorkload instances directly, and so they really can't set the QueuedWorkload priority (at least in our MVP).

Shall we only have Priority in QueuedWorkload or both PriorityClass and Priority?

Alex, I am pretty sure this is frustrating to you, sorry about that, I am happy to make the change myself if you don't have the time.

It doesn't matter. It's worth putting in some effort before making the right decision. I will try to finish refactor the code today and tomorrow.

@ahg-g
Copy link
Contributor

ahg-g commented Mar 16, 2022

Shall we only have Priority in QueuedWorkload or both PriorityClass and Priority?

Both, the job-controller populates it just like we do for pods. Initially I was concerned that there was no path for Admins to block usage of higher priority classes or preventing users from setting directly a high priority on the workload. But that is actually not true because admins could prevent users from creating QW directly and only allow them to create v1.Job.

I still think we want to have priority class on the Queue object but we can leave that as a followup.

It doesn't matter. It's worth putting in some effort before making the right decision. I will try to finish refactor the code today and tomorrow.

Thank you, and I am sorry again for the back and forth, but indeed this was an informative exercise, at least for me.

@denkensk denkensk force-pushed the add-workload-priority branch 2 times, most recently from edacea0 to 9215ca6 Compare March 16, 2022 13:11
@denkensk
Copy link
Member Author

Since this is very similar to my first version of the implementation, so I finish it quickly 😄 pls take a look again. Thanks. @ahg-g

Copy link
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, good for squash

pkg/queue/queue_test.go Outdated Show resolved Hide resolved
pkg/queue/queue_test.go Outdated Show resolved Hide resolved
@ahg-g
Copy link
Contributor

ahg-g commented Mar 16, 2022

It would be nice if we can add a scheduler integration test in a separate ginkgo.It

// keywords which indicate the highest priorities with the former being
// the highest priority. Any other name must be defined by creating a
// PriorityClass object with that name. If not specified, the queuedWorkload
// priority will be default or zero if there is no default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I'm referring to If not specified, the queuedWorkload priority will be default

Is this referring to the default priority class defined for the cluster?

@@ -35,6 +35,7 @@ import (

kueue "sigs.k8s.io/kueue/api/v1alpha1"
"sigs.k8s.io/kueue/pkg/constants"
utilpriority "sigs.k8s.io/kueue/pkg/util/priority"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove the alias. Use just priority

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utilpriority is auto add by auto import. I suggest keeping as utilpriority.

pkg/queue/queue.go Show resolved Hide resolved
pkg/queue/queue.go Outdated Show resolved Hide resolved
pkg/queue/queue_test.go Outdated Show resolved Hide resolved
@@ -24,6 +24,10 @@ import (
kueue "sigs.k8s.io/kueue/api/v1alpha1"
)

var (
lowPriority, highPriority = int32(0), int32(1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it 2 lines

@@ -83,3 +87,98 @@ func TestFIFOClusterQueue(t *testing.T) {
t.Errorf("Queue is not empty, poped workload %q", got.Obj.Name)
}
}

func TestStrictFIFO(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a follow up, we should find a way to merge the 2 tests #72

)

// Priority returns priority of the given workload.
func Priority(w workload.Info) int32 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make this a method of workload.Info?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No special reason. Just keep it as k/k.
And I also think it's not suitable as a member function. But I can change the input as kueue.workload to make it can be used other place in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No special reason. Just keep it as k/k.
And I also think it's not suitable as a member function. But I can change the input as kueue.workload to make it can be used other place in the future.

pkg/controller/workload/job/job_controller.go Outdated Show resolved Hide resolved
@@ -1,3 +1,19 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I wonder if we can make a verify step for this.

Signed-off-by: Alex Wang <wangqingcan1990@gmail.com>
@denkensk
Copy link
Member Author

It would be nice if we can add a scheduler integration test in a separate ginkgo.It

Added integration test for scheduler and job-controller in the second commit @ahg-g

Copy link
Contributor

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! this is looking great!

test/integration/scheduler/scheduler_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/scheduler_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/scheduler_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/scheduler_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/scheduler_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/scheduler_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/scheduler_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/scheduler_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/scheduler_test.go Outdated Show resolved Hide resolved
test/integration/scheduler/scheduler_test.go Outdated Show resolved Hide resolved
@ahg-g
Copy link
Contributor

ahg-g commented Mar 17, 2022

/require-retest

@ArangoGutierrez
Copy link
Contributor

/retest-required

Signed-off-by: Alex Wang <wangqingcan1990@gmail.com>
@denkensk
Copy link
Member Author

Updated base on the review comments. @ahg-g

@ahg-g
Copy link
Contributor

ahg-g commented Mar 17, 2022

/retest-required

@ahg-g
Copy link
Contributor

ahg-g commented Mar 17, 2022

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 17, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, denkensk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 17, 2022
@k8s-ci-robot k8s-ci-robot merged commit 32a2d30 into kubernetes-sigs:main Mar 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants