Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support minSuccess #1384

Merged
merged 6 commits into from
Mar 31, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions config/crd/bases/batch.volcano.sh_jobs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,11 @@ spec:
to the summary of tasks' replicas
format: int32
type: integer
minSuccess:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to set the replicas as the default value.

description: The minimal success pods to run for this Job
format: int32
minimum: 1
type: integer
plugins:
additionalProperties:
items:
Expand Down
5 changes: 5 additions & 0 deletions config/crd/v1beta1/batch.volcano.sh_jobs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,11 @@ spec:
to the summary of tasks' replicas
format: int32
type: integer
minSuccess:
zen-xu marked this conversation as resolved.
Show resolved Hide resolved
description: The minimal success pods to run for this Job
format: int32
minimum: 1
type: integer
plugins:
additionalProperties:
items:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,11 @@ spec:
to the summary of tasks' replicas
format: int32
type: integer
minSuccess:
description: The minimal success pods to run for this Job
format: int32
minimum: 1
type: integer
plugins:
additionalProperties:
items:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,11 @@ spec:
to the summary of tasks' replicas
format: int32
type: integer
minSuccess:
description: The minimal success pods to run for this Job
format: int32
minimum: 1
type: integer
plugins:
additionalProperties:
items:
Expand Down
5 changes: 5 additions & 0 deletions installer/volcano-development.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,11 @@ spec:
to the summary of tasks' replicas
format: int32
type: integer
minSuccess:
description: The minimal success pods to run for this Job
format: int32
minimum: 1
type: integer
plugins:
additionalProperties:
items:
Expand Down
5 changes: 5 additions & 0 deletions pkg/apis/batch/v1alpha1/job.go
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,11 @@ type JobSpec struct {
// If specified, indicates the job's priority.
// +optional
PriorityClassName string `json:"priorityClassName,omitempty" protobuf:"bytes,10,opt,name=priorityClassName"`

// The minimal success pods to run for this Job
// +kubebuilder:validation:Minimum=1
// +optional
MinSuccess *int32 `json:"minSuccess,omitempty" protobuf:"varint,11,opt,name=minSuccess"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why set this field as *int , not int?

}

// VolumeSpec defines the specification of Volume, e.g. PVC.
Expand Down
11 changes: 10 additions & 1 deletion pkg/controllers/job/state/running.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,17 @@ func (ps *runningState) Execute(action v1alpha1.Action) error {
// when scale down to zero, keep the current job phase
return false
}

minSuccess := ps.job.Job.Spec.MinSuccess
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got the reason why you set it as *int instead of int. I perfer to int and set default value to relicas. That may be better. How do you think about it?

Copy link
Contributor Author

@zen-xu zen-xu Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this field should be *int.

MinSuccess is an optional feature, If the user has not set this field, the logic should same as the original.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this field should be *int.

MinSuccess is an optional feature, If the user has not set this field, the logic should same as the original.

Yes, it's reasonable.

if minSuccess != nil && status.Succeeded >= *minSuccess {
status.State.Phase = vcbatch.Completed
return true
}

if status.Succeeded+status.Failed == jobReplicas {
if status.Succeeded >= ps.job.Job.Spec.MinAvailable {
if minSuccess != nil && status.Succeeded < *minSuccess {
status.State.Phase = vcbatch.Failed
} else if status.Succeeded >= ps.job.Job.Spec.MinAvailable {
status.State.Phase = vcbatch.Completed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use minSuccess, do we still need MinAvailable to determine job status?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use MinAvailable to determine job status when users don't want to use MinSuccess

Copy link
Member

@shinytang6 shinytang6 Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, l agree. So maybe we can add another layer of logic inside if status.Succeeded+status.Failed == jobReplicas ? If minSuccess is specified, we use minSuccess instead of minAvailable.

Copy link
Contributor Author

@zen-xu zen-xu Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If minSuccess has matched,Job will be mark Completed. So it's not necessary to handle the next compare logic.

Copy link
Member

@shinytang6 shinytang6 Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l mean if minSuccess is specified while status.Succeeded < *minSuccess all the time, then after all the pods completed, it will enter if status.Succeeded+status.Failed == jobReplicas logic, we should replace if status.Succeeded >= ps.job.Job.Spec.MinAvailable with if status.Succeeded >= minSuccess in that case. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

} else {
status.State.Phase = vcbatch.Failed
Expand Down
69 changes: 69 additions & 0 deletions test/e2e/jobp/min_success.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
/*
Copyright 2021 The Volcano Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package jobp

import (
. "github.com/onsi/ginkgo"
. "github.com/onsi/gomega"
vcbatch "volcano.sh/volcano/pkg/apis/batch/v1alpha1"

e2eutil "volcano.sh/volcano/test/e2e/util"
)

var _ = Describe("Check min success", func() {
It("Min Success", func() {
By("init test ctx")
ctx := e2eutil.InitTestContext(e2eutil.Options{})
defer e2eutil.CleanupTestContext(ctx)

jobName := "min-success-job"
By("create job")
var minSuccess int32 = 2
job := e2eutil.CreateJob(ctx, &e2eutil.JobSpec{
Name: jobName,
MinSuccess: &minSuccess,
Tasks: []e2eutil.TaskSpec{
{
Name: "short",
Img: e2eutil.DefaultBusyBoxImage,
Min: 1,
Rep: 3,
Command: "sleep 3",
},
{
Name: "log",
Img: e2eutil.DefaultBusyBoxImage,
Min: 1,
Rep: 2,
Command: "sleep 100000",
},
},
})

// job phase: pending -> running
err := e2eutil.WaitJobReady(ctx, job)
Expect(err).NotTo(HaveOccurred())

// wait short tasks completed
err = e2eutil.WaitTasksCompleted(ctx, job, minSuccess)
Expect(err).NotTo(HaveOccurred())

// wait job completed
err = e2eutil.WaitJobStates(ctx, job, []vcbatch.JobPhase{vcbatch.Completed}, e2eutil.OneMinute)
Expect(err).NotTo(HaveOccurred())
})
})
35 changes: 34 additions & 1 deletion test/e2e/util/job.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,8 @@ type JobSpec struct {
Volumes []batchv1alpha1.VolumeSpec
NodeName string
// ttl seconds after job finished
Ttl *int32
Ttl *int32
MinSuccess *int32
}

func Namespace(context *TestContext, job *JobSpec) string {
Expand Down Expand Up @@ -194,6 +195,7 @@ func CreateJobInner(ctx *TestContext, jobSpec *JobSpec) (*batchv1alpha1.Job, err
Queue: jobSpec.Queue,
Plugins: jobSpec.Plugins,
TTLSecondsAfterFinished: jobSpec.Ttl,
MinSuccess: jobSpec.MinSuccess,
},
}

Expand Down Expand Up @@ -739,3 +741,34 @@ func IsPodScheduled(pod *v1.Pod) bool {
}
return false
}

func WaitTasksCompleted(ctx *TestContext, job *batchv1alpha1.Job, successNum int32) error {
var additionalError error
err := wait.Poll(100*time.Millisecond, TwoMinute, func() (bool, error) {
pods, err := ctx.Kubeclient.CoreV1().Pods(job.Namespace).List(context.TODO(), metav1.ListOptions{})
Expect(err).NotTo(HaveOccurred())

var succeeded int32 = 0
for _, pod := range pods.Items {
if !metav1.IsControlledBy(&pod, job) {
continue
}

if pod.Status.Phase == "Succeeded" {
succeeded++
}
}

ready := succeeded >= successNum
if !ready {
additionalError = fmt.Errorf("expected job '%s' to have %d succeeded pods, actual got %d", job.Name,
successNum,
succeeded)
}
return ready, nil
})
if err != nil && strings.Contains(err.Error(), TimeOutMessage) {
return fmt.Errorf("[Wait time out]: %s", additionalError)
}
return err
}