diff --git a/keps/104-StartupPolicy/README.md b/keps/104-StartupPolicy/README.md
new file mode 100644
index 000000000..b37bb5806
--- /dev/null
+++ b/keps/104-StartupPolicy/README.md
@@ -0,0 +1,420 @@
+# KEP-104: StartupPolicy
+
+<!--
+This is the title of your KEP. Keep it short, simple, and descriptive. A good
+title can help communicate what the KEP is and should be considered as part of
+any review.
+-->
+
+<!--
+A table of contents is helpful for quickly jumping to sections of a KEP and for
+highlighting any additional information provided beyond the standard KEP
+template.
+
+Ensure the TOC is wrapped with
+  <code>&lt;!-- toc --&rt;&lt;!-- /toc --&rt;</code>
+tags, and then generate with `hack/update-toc.sh`.
+-->
+
+<!-- toc -->
+- [Summary](#summary)
+- [Motivation](#motivation)
+  - [Goals](#goals)
+  - [Non-Goals](#non-goals)
+- [Proposal](#proposal)
+  - [User Stories (Optional)](#user-stories-optional)
+    - [Story 1](#story-1)
+    - [Story 2](#story-2)
+  - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
+  - [Risks and Mitigations](#risks-and-mitigations)
+- [Design Details](#design-details)
+  - [API Proposal](#api-proposal)
+  - [Implementation](#implementation)
+  - [Validation](#validation)
+  - [Test Plan](#test-plan)
+      - [Prerequisite testing updates](#prerequisite-testing-updates)
+    - [Unit Tests](#unit-tests)
+    - [Integration tests](#integration-tests)
+  - [Graduation Criteria](#graduation-criteria)
+- [Implementation History](#implementation-history)
+- [Drawbacks](#drawbacks)
+  - [Poor Person Workflow Engine](#poor-person-workflow-engine)
+  - [Increase number of patches to JobSet](#increase-number-of-patches-to-jobset)
+- [Alternatives](#alternatives)
+  - [Operate on Conditions](#operate-on-conditions)
+  - [Allow for both ready and succeeded](#allow-for-both-ready-and-succeeded)
+<!-- /toc -->
+
+## Summary
+
+This KEP adds a StartupPolicy for JobSets.  A StartupPolicy allows for some flexibility on when to start ReplicatedJobs in a JobSet.  
+<!--
+This section is incredibly important for producing high-quality, user-focused
+documentation such as release notes or a development roadmap. It should be
+possible to collect this information before implementation begins, in order to
+avoid requiring implementors to split their attention between writing release
+notes and implementing the feature itself. KEP editors and SIG Docs
+should help to ensure that the tone and content of the `Summary` section is
+useful for a wide audience.
+
+A good summary is probably at least a paragraph in length.
+
+Both in this section and below, follow the guidelines of the [documentation
+style guide]. In particular, wrap lines to a reasonable length, to make it
+easier for reviewers to cite specific portions, and to minimize diff churn on
+updates.
+
+[documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md
+-->
+
+## Motivation
+
+High Performance Computing / Machine Learning users usually want some control over how jobs are being created.  A Startup policy would allow a user to specify which status a replicated job should have before starting other replicated jobs in a jobSet.
+
+### Goals
+
+- Add a startup policy for JobSet
+
+### Non-Goals
+
+- Complicated workflow specifications
+  - This means that we will support simple dependecies rather than processing a full DAG.
+  - Projects like argo-workflows are a better use case for this.
+- Starting Jobs based on Ready will be at the ReplicatedJob level not at the Job level.
+  - ReplicatedJobs will be classified as ready if all jobs are ready.
+
+## Proposal
+
+### User Stories (Optional)
+
+#### Story 1
+
+As a user, I have one ReplicatedJob that functions as the driver and a series of worker ReplicatedJobs.  
+I want the driver to be ready before I start the worker replicated jobs.  
+This is useful for HPC/AI/ML workloads as some frameworks have a driver pod that must be ready before the workers are started.  
+
+The API below would fit my purpose.
+
+```yaml
+apiVersion: jobset.x-k8s.io/v1alpha2
+kind: JobSet
+metadata:
+  name: driver-ready-worker-start
+spec:
+  startupPolicy:
+     replicatedJobsStartInOrder: true
+  replicatedJobs:
+  - name: driver
+    replicas: 1
+    template:
+      spec:
+        # Set backoff limit to 0 so job will immediately fail if any pod fails.
+        backoffLimit: 0 
+        completions: 1
+        parallelism: 1
+        template:
+          spec:
+            containers:
+            - name: driver
+              image: bash:latest
+              command:
+              - bash
+              - -xc
+              - |
+                sleep 10000
+  - name: workers
+    replicas: 1
+    template:
+      spec:
+        backoffLimit: 0 
+        completions: 2
+        parallelism: 2
+        template:
+          spec:
+            containers:
+            - name: worker
+              image: bash:latest
+              command:
+              - bash
+              - -xc
+              - |
+                sleep 10
+```
+
+#### Story 2
+
+As a user, I want my replicated jobs to start in order.
+
+One area that this could be useful is if someone has a messagequeue replicatedJob, followed by a driver replicatedJob, followed by a worker replicated job.
+
+In this case, I want messagequeue to start, then the driver, and then finally the worker.  
+
+```yaml
+apiVersion: jobset.x-k8s.io/v1alpha2
+kind: JobSet
+metadata:
+  name: messagequeue-driver-worker
+spec:
+  startupPolicy:
+    replicatedJobsStartInOrder: true
+  replicatedJobs:
+  - name: messagequeue
+    replicas: 1
+    template:
+      spec:
+        # Set backoff limit to 0 so job will immediately fail if any pod fails.
+        backoffLimit: 0 
+        completions: 1
+        parallelism: 1
+        template:
+          spec:
+            containers:
+            - name: messagequeue
+              image: bash:latest
+              command:
+              - bash
+              - -xc
+              - |
+                sleep 100
+  - name: driver
+    replicas: 2
+    template:
+      spec:
+        backoffLimit: 0 
+        completions: 2
+        parallelism: 2
+        template:
+          spec:
+            containers:
+            - name: driver
+              image: bash:latest
+              command:
+              - bash
+              - -xc
+              - |
+                sleep 10
+  - name: worker
+    replicas: 2
+    template:
+      spec:
+        backoffLimit: 0 
+        completions: 2
+        parallelism: 2
+        template:
+          spec:
+            containers:
+            - name: worker
+              image: bash:latest
+              command:
+              - bash
+              - -xc
+              - |
+                sleep 10
+```
+
+### Notes/Constraints/Caveats (Optional)
+
+In the JobSet API, ReplicatedJobs are the individual Job elements that are created.  These are Replicated so you can have the same Job replicated.  This adds some complexity in the concept of Ready as we must consider the overall ReplicatedJob Ready rather than using the individual Jobs.
+
+For example, a Job with a `Parallelism`: 4 and `Ready` of 3. Means that 1 pod of a Job is not ready.  
+The Job API has no concept of an overall readiness condition based on a percentage of pods that are ready.  
+
+### Risks and Mitigations
+
+<!--
+What are the risks of this proposal, and how do we mitigate? Think broadly.
+For example, consider both security and how this will impact the larger
+Kubernetes ecosystem.
+
+How will security be reviewed, and by whom?
+
+How will UX be reviewed, and by whom?
+
+Consider including folks who also work outside the SIG or subproject.
+-->
+
+## Design Details
+
+### API Proposal
+
+Keeping the non specified jobs halted we propose to use suspend for these jobs.
+
+So we will add a new field to the ReplicatedJobStatus.
+
+```golang
+// ReplicatedJobStatus defines the observed ReplicatedJobs Readiness.
+type ReplicatedJobStatus struct {
+  ...
+	Suspended int32  `json:"suspended"`
+}
+```
+
+```golang
+// Status of ReplicatedJob for StartupPolicy
+type Status string
+
+const (
+ // Ready means that the job is ready
+ Ready Status = "Ready"
+
+ +// Suspended means that the job is suspended
+ +Suspended Status = "Suspended"
+)
+
+type StartupPolicy struct {
+  // If this field is set to true, we will start all replicatedJobs in order.
+	// Started means that all the Jobs of a ReplicatedJob are ready.
+	// +optional
+	ReplicatedJobsStartInOrder *bool `json:"replicatedJobsStartInOrder,omitempty"`
+}
+```
+
+### Implementation
+
+If StartupPolicy is set and suspend field is off, we will start all jobs in a suspended state.
+
+We will not set a condition for JobSet as suspended so we can distinguish between a suspended JobSet and a JobSet with startup policy.
+To track which ReplicatedJobs are suspended, we will add a `Suspended` field to the ReplicatedJobStatus.  
+This field will track ReplicatedJobs that are yet to start.
+
+If `ReplicatedJobsStartInOrder` is set, then we will loop over all `ReplicatedJobStatuses`.
+We will unsuspend if the ReplicatedJob is suspended.
+Once all the replicatedJob are ready we go onto the next replicated job in the list.
+
+We will check both `SuccessPolicy` and `FailurePolicy` in the reconcile function before checking startup policy. This is to catch cases where the JobSet will be considered Succeeded or Failed.
+  
+### Validation
+
+- StartupPolicy is immutable.
+
+<!--
+This section should contain enough information that the specifics of your
+change are understandable. This may include API specs (though not always
+required) or even code snippets. If there's any ambiguity about HOW your
+proposal will be implemented, this is the place to discuss them.
+-->
+
+### Test Plan
+
+<!--
+**Note:** *Not required until targeted at a release.*
+The goal is to ensure that we don't accept enhancements with inadequate testing.
+
+All code is expected to have adequate tests (eventually with coverage
+expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines]
+when drafting this test plan.
+
+[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
+-->
+
+[x] I/we understand the owners of the involved components may require updates to
+existing tests to make this code solid enough prior to committing the changes necessary
+to implement this enhancement.
+
+##### Prerequisite testing updates
+
+<!--
+Based on reviewers feedback describe what additional tests need to be added prior
+implementing this enhancement to ensure the enhancements have also solid foundations.
+-->
+
+#### Unit Tests
+
+<!--
+In principle every added code should have complete unit test coverage, so providing
+the exact set of tests will not bring additional value.
+However, if complete unit test coverage is not possible, explain the reason of it
+together with explanation why this is acceptable.
+-->
+
+<!--
+Additionally, try to enumerate the core package you will be touching
+to implement this enhancement and provide the current unit coverage for those
+in the form of:
+- <package>: <date> - <current test coverage>
+
+This can inform certain test coverage improvements that we want to do before
+extending the production code to implement this enhancement.
+-->
+
+- `controller`: `Aug 7th 2023` - `31.6%`
+
+Unit tests are low in this area as we do more of our testing at the integration level.
+
+We will add a new file called startup_policy to apply our actions.
+
+#### Integration tests
+
+a. If Suspend is specified, startupPolicy resume is ignored.
+b. [Story 1](#story-1)
+c. [Story 2](#story-2)
+<!--
+Describe what tests will be added to ensure proper quality of the enhancement.
+
+After the implementation PR is merged, add the names of the tests here.
+-->
+
+### Graduation Criteria
+
+<!--
+
+Clearly define what it means for the feature to be implemented and
+considered stable.
+
+If the feature you are introducing has high complexity, consider adding graduation
+milestones with these graduation criteria:
+- [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels]
+- [Feature gate][feature gate] lifecycle
+- [Deprecation policy][deprecation-policy]
+
+[feature gate]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md
+[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions
+[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/
+-->
+
+## Implementation History
+
+- Drafted August 2 2023
+<!--
+Major milestones in the lifecycle of a KEP should be tracked in this section.
+Major milestones might include:
+- the `Summary` and `Motivation` sections being merged, signaling SIG acceptance
+- the `Proposal` section being merged, signaling agreement on a proposed design
+- the date implementation started
+- the first Kubernetes release where an initial version of the KEP was available
+- the version of Kubernetes where the KEP graduated to general availability
+- when the KEP was retired or superseded
+-->
+
+## Drawbacks
+
+### Poor Person Workflow Engine
+
+This startup policy can function as a poor person workflow syntax.  And there may be requests to make this more than what it is.  
+
+We ideally want to utilize other workflows engines to provide more complicated dags but I could see simple ML workflows being used for JobSet.  
+
+### Increase number of patches to JobSet
+
+Since we will start all jobs in a suspended state, we will have to patch the spec of all the jobs eventually.  
+<!--
+Why should this KEP _not_ be implemented?
+-->
+
+## Alternatives
+
+### Operate on Conditions
+
+We originally wanted to operate based on Conditions for Jobs.  But it turns out that there is no Ready condition.  This means that for some cases we would support statuses and others for conditions.  We decided to go forward with operating on status fields for both cases.
+
+### Allow for both ready and succeeded
+
+To avoid JobSets becoming a bad workflow engine.  We want to support startup sequence where once Jobs are ready, then go onto the next job.  
+
+We did not add support for succeeded at this time as this is more in align with a pipeline or workflow.
+
+<!--
+What other approaches did you consider, and why did you rule them out? These do
+not need to be as detailed as the proposal, but should include enough
+information to express the idea and why it was not acceptable.
+-->
diff --git a/keps/104-StartupPolicy/kep.yaml b/keps/104-StartupPolicy/kep.yaml
new file mode 100644
index 000000000..6a47d0a24
--- /dev/null
+++ b/keps/104-StartupPolicy/kep.yaml
@@ -0,0 +1,28 @@
+title: Startup Policy
+kep-number: 104
+authors:
+  - "@kannon92"
+status: provisional
+creation-date: 2023-08-01
+reviewers:
+  - "@danielvegamyhre"
+  - "@vsoch"
+approvers:
+  - "@ahg"
+
+see-also:
+  - "NA"
+replaces:
+  - "NA"
+
+# The target maturity stage in the current dev cycle for this KEP.
+stage: alpha
+
+# The most recent milestone for which work toward delivery of this KEP has been
+# done. This can be the current (upcoming) milestone, if it is being actively
+# worked on.
+latest-milestone: "v0.3.0"
+
+# The milestone at which this feature was, or is targeted to be, at each stage.
+milestone:
+  alpha: "v0.3.0"