Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hierarchical Cohorts] Define Cohort API #2693

Merged
merged 1 commit into from
Jul 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions PROJECT
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# Code generated by tool. DO NOT EDIT.
# This file is used to track the info used to scaffold your project
# and allow the plugins properly work.
# More info: https://book.kubebuilder.io/reference/project-config.html
domain: x-k8s.io
layout:
- go.kubebuilder.io/v3
Expand Down Expand Up @@ -61,4 +65,12 @@ resources:
kind: AdmissionCheck
path: sigs.k8s.io/kueue/apis/kueue/v1beta1
version: v1beta1
- api:
crdVersion: v1
namespaced: true
domain: x-k8s.io
group: kueue
kind: Cohort
path: sigs.k8s.io/kueue/apis/kueue/v1alpha1
version: v1alpha1
version: "3"
105 changes: 105 additions & 0 deletions apis/kueue/v1alpha1/cohort_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
/*
Copyright The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

kueuebeta "sigs.k8s.io/kueue/apis/kueue/v1beta1"
)

// CohortSpec defines the desired state of Cohort
type CohortSpec struct {
// Parent references the name of the Cohort's parent, if
// any. It satisfies one of three cases:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a cycle. What happens in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We disable all members of the Cohort graph. I updated documentation.

// 1) Unset. This Cohort is the root of its Cohort tree.
// 2) References a non-existent Cohort. We use default Cohort (no borrowing/lending limits).
// 3) References an existent Cohort.
//
// If a cycle is created, we disable all members of the
// Cohort, including ClusterQueues, until the cycle is
// removed. We prevent further admission while the cycle
// exists.
//
//+kubebuilder:validation:MaxLength=253
//+kubebuilder:validation:Pattern="^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$"
//
Parent string `json:"parent,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should be a pointer, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CQ also defines like this

Cohort string `json:"cohort,omitempty"`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, an empty string plays the role of no parent. That's fine, consistency is important.


// ResourceGroups describes groupings of Resources and
// Flavors. Each ResourceGroup defines a list of Resources
// and a list of Flavors which provide quotas for these
// Resources. Each Resource and each Flavor may only form part
// of one ResourceGroup. There may be up to 16 ResourceGroups
// within a Cohort.
//
// BorrowingLimit limits how much members of this Cohort
// subtree can borrow from the parent subtree.
//
// LendingLimit limits how much members of this Cohort subtree
// can lend to the parent subtree.
//
// Borrowing and Lending limits must only be set when the
// Cohort has a parent. Otherwise, the Cohort create/update
// will be rejected by the webhook.
//
//+listType=atomic
//+kubebuilder:validation:MaxItems=16
ResourceGroups []kueuebeta.ResourceGroup `json:"resourceGroups,omitempty"`
}

const (
// Condition indicating that a Cohort is correctly configured,
// for example, there is no cycle.
CohortActive = "Active"
)

// CohortStatus defines the observed state of Cohort
type CohortStatus struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect it to be left empty? If yes - drop the struct for now. If not - please add the expected content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the Conditions field, and the CohortActive field. One note: I modified the condition from the KEP slightly, to match ClusterQueue

KEP

CohortActive = "CohortActive"

This PR

CohortActive = "Active"


//+listType=map
//+listMapKey=type
//+patchStrategy=merge
//+patchMergeKey=type
Conditions []metav1.Condition `json:"conditions,omitempty"`
}

//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
//+kubebuilder:resource:scope=Cluster

// Cohort is the Schema for the cohorts API
type Cohort struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer to have some prototype or at least minimal functionality implemented before merging the API. It would increase the confidence in the API. Still, we can merge it as a separate PR, but would be good to see the implementation more on the horizon. WDYT @tenzen-y ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is possible, but my intention was to keep the PRs as small as possible for reviewability. Also, this is in v1alpha1, and not yet cut into a minor release, so I'd argue we can change it freely for some time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to select either approach

  1. a single big bung PR containing all implementations and APIs.
  2. multiple small PRs, but API changes PR are merged in the final phase.

TBH, I would prefer to opt 2 since it's challenging to review the opt 1 PR.
In the case of opt 2, we should expose APIs in the last phase since it is better not to expose the unimplemented APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need this type for development of the rest of the features - in this case, should I just define it in pkg for now, and then move it to api later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... my intention was to keep the PRs as small as possible for reviewability.

That's for sure, I was also thinking about merging this PR separately, but once seeing some PoC implementation to increase the confidence the API can be released.

Also, this is in v1alpha1

Good point, and the first iteration of the API looks quite minimal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The api was discussed and merged in https://github.com/kubernetes-sigs/kueue/tree/main/keps/79-hierarchical-cohorts. I'm not sure if adding basic implementation to this PR (that has very little to do with the API - it is mostly about scheduling and fitting the workloads) would make it more convincing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that you can learn how we should go here: #1714

Yeah, with the caveat that fair sharing for API only introduced fields rather than API. For a feature which requires new API it might be harder to develop it's logic without that API merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The api was discussed and merged in https://github.com/kubernetes-sigs/kueue/tree/main/keps/79-hierarchical-cohorts.

Indeed, together with the fact this is still alpha API to reduce the burden of rebases I would be leaning to merge it.

The only downside is that in case we need to release 0.9 urgently we will release hollow alpha API.Are we good with this @mwielgus @tenzen-y ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only downside is that in case we need to release 0.9 urgently we will release hollow alpha API.Are we good with this @mwielgus @tenzen-y ?

That is my primary concern, as I mentioned above.

In the case of opt 2, we should expose APIs in the last phase since it is better not to expose the unimplemented APIs.

In the case of urgently minor release, let's revert all PRs related to Hierarchical Cohorts...
I hope that we never face the situation...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of urgently minor release, let's revert all PRs related to Hierarchical Cohorts... I hope that we never face the situation...

I guess it will be dependent on the completeness of the feature at the moment of releasing 0.9.

I synced with @gabesaba and we are ok to rollback the PRs related to the new API if the feature is still vastly unfinished when doing 0.9.

metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

Spec CohortSpec `json:"spec,omitempty"`
Status CohortStatus `json:"status,omitempty"`
}

//+kubebuilder:object:root=true

// CohortList contains a list of Cohort
type CohortList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []Cohort `json:"items"`
}

func init() {
SchemeBuilder.Register(&Cohort{}, &CohortList{})
}
104 changes: 104 additions & 0 deletions apis/kueue/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading