Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.6.0 #655

Closed
20 tasks done
danielvegamyhre opened this issue Aug 19, 2024 · 6 comments
Closed
20 tasks done

Release v0.6.0 #655

danielvegamyhre opened this issue Aug 19, 2024 · 6 comments

Comments

@danielvegamyhre
Copy link
Contributor

danielvegamyhre commented Aug 19, 2024

Release Checklist

  • All OWNERS must LGTM the release proposal
  • Verify that the changelog in this issue is up-to-date
  • For major or minor releases (v$MAJ.$MIN.0), create a new release branch.
    • an OWNER creates a vanilla release branch with
      git branch release-$MAJ.$MIN main
    • An OWNER pushes the new release branch with
      git push release-$MAJ.$MIN
  • Update things like README, deployment templates, docs, configuration, test/e2e flags.
    Submit a PR against the release branch:
  • An OWNER prepares a draft release
    • Write the change log into the draft release.
    • Run
      make artifacts IMAGE_REGISTRY=registry.k8s.io/jobset GIT_TAG=$VERSION
      to generate the artifacts and upload the files in the artifacts folder
      to the draft release.
  • An OWNER creates a signed tag running
    git tag -s $VERSION
    and inserts the changelog into the tag description.
    To perform this step, you need a PGP key registered on github.
  • An OWNER pushes the tag with
    git push $VERSION
    • Triggers prow to build and publish a staging container image
      gcr.io/k8s-staging-jobset/jobset:$VERSION
  • Submit a PR against k8s.io,
    updating k8s.gcr.io/images/k8s-staging-jobset/images.yaml to
    promote the container images
    to production:
  • Wait for the PR to be merged and verify that the image registry.k8s.io/jobset/jobset:$VERSION is available.
  • Publish the draft release prepared at the Github releases page.
  • Add a link to the tagged release in this issue:
  • Send an announcement email to sig-apps@kubernetes.io, sig-scheduling@kubernetes.io and wg-batch@kubernetes.io with the subject [ANNOUNCE] JobSet $VERSION is released
  • Add a link to the release announcement in this issue:
  • For a major or minor release, update README.md and docs/setup/install.md
    in main branch:
  • For a major or minor release, create an unannotated devel tag in the
    main branch, on the first commit that gets merged after the release
    branch has been created (presumably the README update commit above), and, push the tag:
    DEVEL=v0.$(($MAJ+1)).0-devel; git tag $DEVEL main && git push $DEVEL
    This ensures that the devel builds on the main branch will have a meaningful version number.
  • Close this issue

Changelog

Highlights

  • New JobSet Failure Policy API - allows users to configure different behavior for different types of errors, enabling them to use compute resources more efficiently and improve ML training goodput.
  • Add Coordinator field to JobSet spec, enabling user to define a global coordinator pod for distributed ML/HPC workloads. The stable network endpoint for this pod will be added as a label and annotation to every Job and Pod in the JobSet for easy use in application code. A common use case for this is TPU Multislice training with multiple different Job templates. See linked issue for details.
  • Add global Job index label/annotation to every Job and Pod, which is needed to support TPU Multislice training with multiple different Job templates. See linked issue for details.
  • Added new metrics
  • Improved test coverage
  • Bug fixes
  • New examples and documentation

What's Changed

@danielvegamyhre
Copy link
Contributor Author

cc @ahg-g @kannon92

@ahg-g
Copy link
Contributor

ahg-g commented Aug 20, 2024

/lgtm

@danielvegamyhre
Copy link
Contributor Author

@kannon92
Copy link
Contributor

We should maybe highlight #644 as that enables Kueue integration.

@danielvegamyhre
Copy link
Contributor Author

We should maybe highlight #644 as that enables Kueue integration.

We've had integration with Kueue since v0.2.0, this was a bug fix for Kueue integration.

@danielvegamyhre
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants