Add features overview to README #452

danielvegamyhre · 2024-03-13T23:41:18Z

Add features overview to README

k8s-ci-robot · 2024-03-13T23:41:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danielvegamyhre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [danielvegamyhre]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

netlify · 2024-03-13T23:41:37Z

✅ Deploy Preview for kubernetes-sigs-jobset canceled.

Name	Link
🔨 Latest commit	`616d90e`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-jobset/deploys/65fcc0692c1e6300085686ac

danielvegamyhre · 2024-03-14T00:33:50Z

README.md


-Read the [installation guide](/docs/setup/install.md) to learn more.
+- **Exclusive Placement Per Topology Domain**: JobSet includes an [annotation](https://github.com/kubernetes-sigs/jobset/blob/1ae6c0c039c21d29083de38ae70d13c2c8ec613f/examples/simple/exclusive-placement.yaml#L6) which can be set by the user, specifying that there should be a 1:1 mapping between child job and a particular topology domain, such as a datacenter rack or zone. This means that all the pods belonging to a child job will be colocated in the same topology domain, while pods from other jobs will not be allowed to run within this domain. This gives the child job exclusive access to computer resources in this domain.


Question for reviewers: I think this feature will make little sense to users without a concrete use case, but the only one I can think of is TPU Multislice training, and since TPUs are specific to Google I didn't include it here. If anyone has any a suggestion for a concrete use case here I would appreciate it. I am happy to include TPU multislice training as well, based on feedback.

maybe @vsoch has some ideas of a general example?

If we come up with a better concrete example we can add it in a follow up PR. For now I think we should get the feature overview list into the README so it's clear to potential users glancing at the Github landing page what JobSet offers.

Sorry missed this comment! Mapping to the level of a rack isn't particularly useful, at least it doesn't belong at this level - when we deploy to Google Cloud we usually ask for COMPACT mode when we want some guarantee of rack closeness. For mapping topology that is interesting, a better example is 1 pod per node. I think that can typically be achieved with resource requests / limits that are slightly below the node max capacity, and (maybe) a suggestion to the scheduler with affinity rules (but in practice I have found this is not enough). The topology that we are really interested in is more fine grained than that, and probably would need to be under the jurisdiction of the kubelet.

Also I'm designing a new project idea that (I think) will use JobSet again, will ping you / keep you in the loop if/when it manifests. No pun intended! :P

README.md

kannon92 · 2024-03-21T20:53:59Z

README.md

@@ -44,6 +48,20 @@ Read the [installation guide](/docs/setup/install.md) to learn more.
 - ✔️ Security: RBAC based accessibility.
 - ✔️ Stable release cycle(2-3 months) for new features, bugfixes, cleanups.

+## Installation
+
+**Requires Kubernetes 1.26 or newer**.


Can we say that we follow Kubernetes release process?

In 1-2 months I think we would want to bump this to Kubernetes 1.27..

Sure, so something like:

Maintains support for latest 3 Kubernetes minor versions. Current: 1.27, 1.28, 1.29

(I know we currently run e2e-tests with 1.26 as well, but we should remove this and just focus on support for latest 3 minors, to align with upstream k8s).

What are your thoughts on this?

My goal would be to avoid having to PR to keep these versions up to date..

Maintains support for latest 3 Kubernetes minor versions.

Ok I added a line to "production readiness" bullets about this, and then here (installation instructiosn) I mentioned one of the last 3 minor versions is required.

kannon92

/lgtm

add features overview

27bfb68

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 13, 2024

k8s-ci-robot requested review from ahg-g and kannon92 March 13, 2024 23:41

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 13, 2024

danielvegamyhre commented Mar 14, 2024

View reviewed changes

kannon92 reviewed Mar 14, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

danielvegamyhre added 2 commits March 14, 2024 17:29

mention kueue integration

bed2b53

use v0.4.0 for example yamls

aa7c1b5

danielvegamyhre force-pushed the features branch from fff98f6 to aa7c1b5 Compare March 20, 2024 23:35

kannon92 reviewed Mar 21, 2024

View reviewed changes

address minor version comments

616d90e

kannon92 reviewed Mar 22, 2024

View reviewed changes

k8s-ci-robot assigned kannon92 Mar 22, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 22, 2024

k8s-ci-robot merged commit 764c222 into kubernetes-sigs:main Mar 22, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add features overview to README #452

Add features overview to README #452

danielvegamyhre commented Mar 13, 2024

k8s-ci-robot commented Mar 13, 2024

netlify bot commented Mar 13, 2024 •

edited

Loading

danielvegamyhre Mar 14, 2024

kannon92 Mar 14, 2024

danielvegamyhre Mar 21, 2024

vsoch Mar 21, 2024

vsoch Mar 21, 2024

kannon92 Mar 21, 2024

danielvegamyhre Mar 21, 2024

kannon92 Mar 21, 2024

danielvegamyhre Mar 21, 2024

kannon92 left a comment


		Read the [installation guide](/docs/setup/install.md) to learn more.
		- Exclusive Placement Per Topology Domain: JobSet includes an [annotation](https://github.com/kubernetes-sigs/jobset/blob/1ae6c0c039c21d29083de38ae70d13c2c8ec613f/examples/simple/exclusive-placement.yaml#L6) which can be set by the user, specifying that there should be a 1:1 mapping between child job and a particular topology domain, such as a datacenter rack or zone. This means that all the pods belonging to a child job will be colocated in the same topology domain, while pods from other jobs will not be allowed to run within this domain. This gives the child job exclusive access to computer resources in this domain.

Add features overview to README #452

Add features overview to README #452

Conversation

danielvegamyhre commented Mar 13, 2024

k8s-ci-robot commented Mar 13, 2024

netlify bot commented Mar 13, 2024 • edited Loading

✅ Deploy Preview for kubernetes-sigs-jobset canceled.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kannon92 left a comment

Choose a reason for hiding this comment

netlify bot commented Mar 13, 2024 •

edited

Loading