diff --git a/keps/104-StartupPolicy/README.md b/keps/104-StartupPolicy/README.md new file mode 100644 index 000000000..b37bb5806 --- /dev/null +++ b/keps/104-StartupPolicy/README.md @@ -0,0 +1,420 @@ +# KEP-104: StartupPolicy + + + + + + +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [API Proposal](#api-proposal) + - [Implementation](#implementation) + - [Validation](#validation) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit Tests](#unit-tests) + - [Integration tests](#integration-tests) + - [Graduation Criteria](#graduation-criteria) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) + - [Poor Person Workflow Engine](#poor-person-workflow-engine) + - [Increase number of patches to JobSet](#increase-number-of-patches-to-jobset) +- [Alternatives](#alternatives) + - [Operate on Conditions](#operate-on-conditions) + - [Allow for both ready and succeeded](#allow-for-both-ready-and-succeeded) + + +## Summary + +This KEP adds a StartupPolicy for JobSets. A StartupPolicy allows for some flexibility on when to start ReplicatedJobs in a JobSet. + + +## Motivation + +High Performance Computing / Machine Learning users usually want some control over how jobs are being created. A Startup policy would allow a user to specify which status a replicated job should have before starting other replicated jobs in a jobSet. + +### Goals + +- Add a startup policy for JobSet + +### Non-Goals + +- Complicated workflow specifications + - This means that we will support simple dependecies rather than processing a full DAG. + - Projects like argo-workflows are a better use case for this. +- Starting Jobs based on Ready will be at the ReplicatedJob level not at the Job level. + - ReplicatedJobs will be classified as ready if all jobs are ready. + +## Proposal + +### User Stories (Optional) + +#### Story 1 + +As a user, I have one ReplicatedJob that functions as the driver and a series of worker ReplicatedJobs. +I want the driver to be ready before I start the worker replicated jobs. +This is useful for HPC/AI/ML workloads as some frameworks have a driver pod that must be ready before the workers are started. + +The API below would fit my purpose. + +```yaml +apiVersion: jobset.x-k8s.io/v1alpha2 +kind: JobSet +metadata: + name: driver-ready-worker-start +spec: + startupPolicy: + replicatedJobsStartInOrder: true + replicatedJobs: + - name: driver + replicas: 1 + template: + spec: + # Set backoff limit to 0 so job will immediately fail if any pod fails. + backoffLimit: 0 + completions: 1 + parallelism: 1 + template: + spec: + containers: + - name: driver + image: bash:latest + command: + - bash + - -xc + - | + sleep 10000 + - name: workers + replicas: 1 + template: + spec: + backoffLimit: 0 + completions: 2 + parallelism: 2 + template: + spec: + containers: + - name: worker + image: bash:latest + command: + - bash + - -xc + - | + sleep 10 +``` + +#### Story 2 + +As a user, I want my replicated jobs to start in order. + +One area that this could be useful is if someone has a messagequeue replicatedJob, followed by a driver replicatedJob, followed by a worker replicated job. + +In this case, I want messagequeue to start, then the driver, and then finally the worker. + +```yaml +apiVersion: jobset.x-k8s.io/v1alpha2 +kind: JobSet +metadata: + name: messagequeue-driver-worker +spec: + startupPolicy: + replicatedJobsStartInOrder: true + replicatedJobs: + - name: messagequeue + replicas: 1 + template: + spec: + # Set backoff limit to 0 so job will immediately fail if any pod fails. + backoffLimit: 0 + completions: 1 + parallelism: 1 + template: + spec: + containers: + - name: messagequeue + image: bash:latest + command: + - bash + - -xc + - | + sleep 100 + - name: driver + replicas: 2 + template: + spec: + backoffLimit: 0 + completions: 2 + parallelism: 2 + template: + spec: + containers: + - name: driver + image: bash:latest + command: + - bash + - -xc + - | + sleep 10 + - name: worker + replicas: 2 + template: + spec: + backoffLimit: 0 + completions: 2 + parallelism: 2 + template: + spec: + containers: + - name: worker + image: bash:latest + command: + - bash + - -xc + - | + sleep 10 +``` + +### Notes/Constraints/Caveats (Optional) + +In the JobSet API, ReplicatedJobs are the individual Job elements that are created. These are Replicated so you can have the same Job replicated. This adds some complexity in the concept of Ready as we must consider the overall ReplicatedJob Ready rather than using the individual Jobs. + +For example, a Job with a `Parallelism`: 4 and `Ready` of 3. Means that 1 pod of a Job is not ready. +The Job API has no concept of an overall readiness condition based on a percentage of pods that are ready. + +### Risks and Mitigations + + + +## Design Details + +### API Proposal + +Keeping the non specified jobs halted we propose to use suspend for these jobs. + +So we will add a new field to the ReplicatedJobStatus. + +```golang +// ReplicatedJobStatus defines the observed ReplicatedJobs Readiness. +type ReplicatedJobStatus struct { + ... + Suspended int32 `json:"suspended"` +} +``` + +```golang +// Status of ReplicatedJob for StartupPolicy +type Status string + +const ( + // Ready means that the job is ready + Ready Status = "Ready" + + +// Suspended means that the job is suspended + +Suspended Status = "Suspended" +) + +type StartupPolicy struct { + // If this field is set to true, we will start all replicatedJobs in order. + // Started means that all the Jobs of a ReplicatedJob are ready. + // +optional + ReplicatedJobsStartInOrder *bool `json:"replicatedJobsStartInOrder,omitempty"` +} +``` + +### Implementation + +If StartupPolicy is set and suspend field is off, we will start all jobs in a suspended state. + +We will not set a condition for JobSet as suspended so we can distinguish between a suspended JobSet and a JobSet with startup policy. +To track which ReplicatedJobs are suspended, we will add a `Suspended` field to the ReplicatedJobStatus. +This field will track ReplicatedJobs that are yet to start. + +If `ReplicatedJobsStartInOrder` is set, then we will loop over all `ReplicatedJobStatuses`. +We will unsuspend if the ReplicatedJob is suspended. +Once all the replicatedJob are ready we go onto the next replicated job in the list. + +We will check both `SuccessPolicy` and `FailurePolicy` in the reconcile function before checking startup policy. This is to catch cases where the JobSet will be considered Succeeded or Failed. + +### Validation + +- StartupPolicy is immutable. + + + +### Test Plan + + + +[x] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + + + +#### Unit Tests + + + + + +- `controller`: `Aug 7th 2023` - `31.6%` + +Unit tests are low in this area as we do more of our testing at the integration level. + +We will add a new file called startup_policy to apply our actions. + +#### Integration tests + +a. If Suspend is specified, startupPolicy resume is ignored. +b. [Story 1](#story-1) +c. [Story 2](#story-2) + + +### Graduation Criteria + + + +## Implementation History + +- Drafted August 2 2023 + + +## Drawbacks + +### Poor Person Workflow Engine + +This startup policy can function as a poor person workflow syntax. And there may be requests to make this more than what it is. + +We ideally want to utilize other workflows engines to provide more complicated dags but I could see simple ML workflows being used for JobSet. + +### Increase number of patches to JobSet + +Since we will start all jobs in a suspended state, we will have to patch the spec of all the jobs eventually. + + +## Alternatives + +### Operate on Conditions + +We originally wanted to operate based on Conditions for Jobs. But it turns out that there is no Ready condition. This means that for some cases we would support statuses and others for conditions. We decided to go forward with operating on status fields for both cases. + +### Allow for both ready and succeeded + +To avoid JobSets becoming a bad workflow engine. We want to support startup sequence where once Jobs are ready, then go onto the next job. + +We did not add support for succeeded at this time as this is more in align with a pipeline or workflow. + + diff --git a/keps/104-StartupPolicy/kep.yaml b/keps/104-StartupPolicy/kep.yaml new file mode 100644 index 000000000..6a47d0a24 --- /dev/null +++ b/keps/104-StartupPolicy/kep.yaml @@ -0,0 +1,28 @@ +title: Startup Policy +kep-number: 104 +authors: + - "@kannon92" +status: provisional +creation-date: 2023-08-01 +reviewers: + - "@danielvegamyhre" + - "@vsoch" +approvers: + - "@ahg" + +see-also: + - "NA" +replaces: + - "NA" + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v0.3.0" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v0.3.0"