Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.9.0 #3302

Closed
34 tasks done
mimowo opened this issue Oct 24, 2024 · 6 comments
Closed
34 tasks done

Release v0.9.0 #3302

mimowo opened this issue Oct 24, 2024 · 6 comments
Assignees

Comments

@mimowo
Copy link
Contributor

mimowo commented Oct 24, 2024

Release Checklist

  • OWNERS must LGTM the release proposal.
    At least two for minor or major releases. At least one for a patch release.
  • Verify that the changelog in this issue and the CHANGELOG folder is up-to-date
  • For major or minor releases (v$MAJ.$MIN.0), create a new release branch.
    • An OWNER creates a vanilla release branch with
      git branch release-$MAJ.$MIN main
    • An OWNER pushes the new release branch with
      git push release-$MAJ.$MIN
  • Update the release branch:
    • Update RELEASE_BRANCH and RELEASE_VERSION in Makefile and run make prepare-release-branch
    • Update the CHANGELOG
    • Submit a pull request with the changes: Prepare release v0.9.0 #3458
  • An OWNER creates a signed tag running
    git tag -s $VERSION
    and inserts the changelog into the tag description.
    To perform this step, you need a PGP key registered on github.
  • An OWNER pushes the tag with
    git push upstream $VERSION
    • Triggers prow to build and publish a staging container image
      us-central1-docker.pkg.dev/k8s-staging-images/kueue/kueue:$VERSION
  • An OWNER prepares a draft release
    • Create the draft release poiting out to the created tag.
    • Write the change log into the draft release.
    • Run
      make artifacts IMAGE_REGISTRY=registry.k8s.io/kueue GIT_TAG=$VERSION
      to generate the artifacts in the artifacts folder.
    • Upload the files in the artifacts folder to the draft release - either
      via UI or gh release --repo kubernetes-sigs/kueue upload <tag> artifacts/*.
  • Submit a PR against k8s.io,
    updating registry.k8s.io/images/k8s-staging-kueue/images.yaml to
    promote the container images
    to production: Promote kueue v0.9.0 kubernetes/k8s.io#7486
  • Wait for the PR to be merged and verify that the image registry.k8s.io/kueue/kueue:$VERSION is available.
  • Publish the draft release prepared at the GitHub releases page.
    Link: https://github.com/kubernetes-sigs/kueue/releases/tag/v0.9.0
  • Run the openvex action to generate openvex data. The action will add the file to the release artifacts.
  • Update the main branch :
    • Update RELEASE_VERSION in Makefile and run make prepare-release-branch
    • Release notes in the CHANGELOG
    • SECURITY-INSIGHTS.yaml values by running make update-security-insights GIT_TAG=$VERSION
    • Submit a pull request with the changes: Update latest kueue v0.9.0 #3461
    • Cherry-pick the pull request onto the website branch
  • Run the SBOM action to generate the SBOM and add it to the release.
  • For major or minor releases, merge the main branch into the website branch to publish the updated documentation.
  • Send an announcement email to sig-scheduling@kubernetes.io and wg-batch@kubernetes.io with the subject [ANNOUNCE] kueue $VERSION is released.
  • For a major or minor release, prepare the repo for the next version:

Changelog

Changes since `v0.8.0`:

## Urgent Upgrade Notes 

### (No, really, you MUST read this before you upgrade)

- Changed the `type` of `Pending` events, emitted when a Workload can't be admitted, from `Normal` to `Warning`.
  
  Update tools that process this event if they depend on the event `type`. (#3264, @kebe7jun)
 - Deprecated SingleInstanceInClusterQueue and FlavorIndependent status conditions.
  
  the Admission check status conditions “FlavorIndependent” and “SingleInstanceInClusterQueue” are no longer supported by default.
  If you were using any of these conditions for your external AdmissionCheck you need to enable the `AdmissionCheckValidationRules` feature gate. 
  For the future releases you will need to provide validation by an external controller. (#3254, @mszadkow)
 - Promote MultiKueue API and feature gate to Beta. The MultiKueue feature gate is now beta and enabled by default.
  
  The MultiKueue specific types are now part of the Kueue's `v1beta1` API. `v1alpha` types are no longer supported. (#3230, @trasc)
 - Promoted VisibilityOnDemand to Beta and enabled by default.
  
  The v1alpha1 Visibility API is deprecated and will be removed in the next release. Please use v1beta1 instead. (#3008, @mbobrovskyi)
 - Provides more details on the reasons for ClusterQueues being inactive.
  If you were watching for the reason `CheckNotFoundOrInactive` in the ClusterQueue condition, watch `AdmissionCheckNotFound` and `AdmissionCheckInactive` instead. (#3127, @trasc)
 - The QueueVisibility feature and its corresponding API was deprecated.
  
  The QueueVisibility feature and its corresponding API was deprecated and will be removed in the v1beta2. Please use VisibilityOnDemand (https://kueue.sigs.k8s.io/docs/tasks/manage/monitor_pending_workloads/pending_workloads_on_demand/) instead. (#3110, @mbobrovskyi)
 
## Changes by Kind

### Feature

- Add gauge metric admission_cycle_preemption_skips that reports the number of Workloads in a ClusterQueue
  that got preemptions candidates, but had to be skipped in the last cycle. (#2919, @alculquicondor)
- Add integration for Deployment, where each Pod is treated as a separate Workload. (#2813, @vladikkuzn)
- Add integration for StatefulSet where Pods are managed by the pod-group integration. (#3001, @vladikkuzn)
- Added FlowSchema and PriorityLevelConfiguration for Visibility API. (#3043, @mbobrovskyi)
- Added a new optional `resource.transformations` section to the `Configuration` API  that enables limited customization
  of how the resource requirements of a Workload are computed from the resource requests and limits of a Job. (#3026, @dgrove-oss)
- Added a way to specify dependencies between job integrations. (#2768, @trasc)
- Best effort support for scenarios when the Job is created at the same time as prebuilt workload or momentarily before the workload. In that case an error is logged to indicate that creating a Job before prebuilt-workload is outside of the intended use. (#3255, @mbobrovskyi)
- CLI: Added EXEC TIME column on kueuectl list workload command. (#2977, @mbobrovskyi)
- CLI: Added list pods for a job command. (#2280, @Kavinraja-G)
- CLI: Use protobuf encoding for core K8s APIs in kueuectl. (#3077, @tosi3k)
- Calculate AllocatableResourceGeneration more accurately. This fixes a bug where a workload might not have the Flavors it was assigned in a previous scheduling cycle invalidated, when the resources in the Cohort had changed. This bug could occur when other ClusterQueues were deleted from the Cohort. (#2984, @gabesaba)
- Detect and enable support for job CRDs installed after Kueue starts. (#2574, @ChristianZaccaria)
- Exposed available ResourceFlavors from the ClusterQueue in the LocalQueue status. (#3143, @mbobrovskyi)
- Graduated LendingLimit to Beta and enabled by default. (#2909, @macsko)
- Graduated MultiplePreemptions to Beta and enabled by default. (#2864, @macsko)
- Helm: Support the topologySpreadConstraints and PodDisruptionBudget (#3295, @woehrl01)
- Hierarchical Cohorts, introduced with the v1alpha1 Cohorts API, allow users to group resources in an arbitrary tree structure. Additionally, quotas and limits can now be defined directly at the Cohort level. See #79 for more details. (#2693, @gabesaba)
- Included visibility-api.yaml as a part of main.yaml (#3084, @mbobrovskyi)
- Introduce the "kueue.x-k8s.io/pod-group-fast-admission" annotation to Plain Pod integration.
  
  If the PlainPod has the annotation and is part of the Plain PodGroup, the Kueue will admit the Plain Pod regardless of whether all PodGroup Pods are created. (#3189, @vladikkuzn)
- Introduce the new PodTemplate annotation kueue.x-k8s.io/workload, and label kueue.x-k8s.io/podset.
  The annotation and label are alpha-level and gated by the new TopologyAwareScheduling feature gate. (#3228, @mimowo)
- Label `kueue.x-k8s.io/managed` is now added to PodTemplates created via ProvisioningRequest by Kueue (#2877, @PBundyra)
- MultiKueue: Add support for  MPIJob  `spec.runPolicy.managedBy` field (#3289, @mszadkow)
- MultiKueue: Support for the Kubeflow MPIJob (#2880, @mszadkow)
- MultiKueue: Support for the Kubeflow PaddleJob (#2744, @mszadkow)
- MultiKueue: Support for the Kubeflow PyTorchJob (#2735, @mszadkow)
- MultiKueue: Support for the Kubeflow TFJob (#2626, @mszadkow)
- MultiKueue: Support for the Kubeflow XGBoostJob (#2746, @mszadkow)
- ProvisioningRequest: Record the ProvisioningRequest creation errors to event and ProvisioningRequest status. (#3056, @IrvingMg)
- ProvisioningRequestConfig API has now `RetryStrategy` field that allows users to configure retries per ProvisioningRequest class. By default retry releases allocated quota in Kueue. (#3375, @PBundyra)
- Publish images via artifact registry (#2476, @IrvingMg)
- Support Topology Aware Scheduling (TAS) in Kueue in the Alpha version, along with the new Topology API
  to specify the ordered list of node labels corresponding to the different levels of hierarchy in data-centers
  (like racks or blocks).
  
  Additionally, we introduce the pair of Job-level annotations: `http://kueue.x-k8s.io/podset-required-topology`
  and `kueue.x-k8s.io/podset-preferred-topology` which users can use to indicate their preference for the
  Jobs to run all their Pods within a topology domain at the indicated level. (#3235, @mimowo)
- Support for JobSet 0.6 (#3034, @kannon92)
- Support for Kubernetes 1.31 (#2402, @mbobrovskyi)
- Support the Job-level API label, called `kueue.x-k8s.io/max-exec-time-seconds`, that users
  can use to enforce the maximum execution time for their job. The execution time is only
  accumulated when the Job is running (the corresponding Workload is admitted). 
  The corresponding Workload is deactivated after the time is exceeded. (#3191, @trasc)

### Documentation

- Adds installing kubectl-kueue plugin via Krew guide. (#2666, @mbobrovskyi)
- Documentation on how to use Kueue for Deployments is added (#2698, @vladikkuzn)

### Bug or Regression

- CLI: Delete the corresponding Job when deleting a Workload. (#2992, @mbobrovskyi)
- CLI: Support `-` and `.` in the resource flavor name on `create cq` (#2703, @trasc)
- Fix a bug that could delay the election of a new leader in the Kueue with multiple replicas env. (#3093, @tenzen-y)
- Fix over-admission after deleting resources from borrowing ClusterQueue. (#2873, @mbobrovskyi)
- Fix resource consumption computation for partially admitted workloads. (#3118, @trasc)
- Fix restoring parallelism on eviction for partially admitted batch/Jobs. (#3153, @trasc)
- Fix some scenarios for partial admission which are affected by wrong calculation of resources
  used by the incoming workload which is partially admitted and preempting. (#2826, @trasc)
- Fix support for kuberay 1.2.x (#2960, @mbobrovskyi)
- Fix webook validation for batch/Job to allow partial admission of a Job to use all available resources.
  It also fixes a scenario of partial re-admission when some of the Pods are already reclaimed. (#3152, @trasc)
- Helm: Fix a bug for "unclosed action error". (#2683, @mbobrovskyi)
- Prevent infinite preemption loop when PrioritySortingWithinCohort=false
  is used together with borrowWithinCohort. (#2807, @mimowo)
- Prevent job webhooks from dropping fields for newer API fields when Kueue libraries are behind the latest released CRDs. (#3132, @alculquicondor)
- RayJob's implementation of Finished() now inspects at JobDeploymentStatus (#3120, @andrewsykim)
- Support for helm charts in the us-central1-docker.pkg.dev/k8s-staging-images/charts repository (#2680, @IrvingMg)
- Update Flavor selection logic to prefer Flavors which allow reclamation of lent nominal quota, over Flavors which require preempting workloads within the ClusterQueue. This matches the behavior in the single Flavor case. (#2811, @gabesaba)
- Workload is requeued with all AdmissionChecks set to Pending if there was an AdmissionCheck in Retry state. (#3323, @PBundyra)
- Account for NumOfHosts when calculating PodSet assignments for RayJob and RayCluster (#3384, @andrewsykim)

### Other (Cleanup or Flake)

- Add a jobframework.BaseWebhook that can be used for custom job integrations (#3102, @alculquicondor)
@tenzen-y
Copy link
Member

+1

@mimowo
Copy link
Contributor Author

mimowo commented Oct 24, 2024

We have the release candidate: https://github.com/kubernetes-sigs/kueue/releases/tag/v0.9.0-rc.1

Please test, report issues and propose fixes :) we are aiming for the full release 4th Nov.

@alculquicondor
Copy link
Contributor

+1

@mimowo
Copy link
Contributor Author

mimowo commented Nov 5, 2024

I've updated the release notes, compared to v0.9.0-rc.1 the new entries are:

@tenzen-y
Copy link
Member

tenzen-y commented Nov 5, 2024

All has been completed. Thank you!
/close

@k8s-ci-robot
Copy link
Contributor

@tenzen-y: Closing this issue.

In response to this:

All has been completed. Thank you!
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants