-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support running #894
Conversation
/hold Until 0.7 is released |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past, we mark the trial running only when the jobs is created. If this reconcile return error, then we will not update the running status again since we cannot go to the code block again. This PR is to fix the problem.
/assign @hougangliu @johnugeorge
/hold cancel |
/retest |
1 similar comment
/retest |
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Signed-off-by: Ce Gao <gaoce@caicloud.io>
@gaocegege Can you rebase? I will review it today |
"err", "Status is not found in job") | ||
return &jobCondition, nil | ||
// Job does not have the running condition in status, thus we think | ||
// the job is running when it is created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this comment? This is little confusing. When there is no status field, it means that job is just created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, The Job does not have a running condition, thus we cannot use the same logic as TFJob
return &jobCondition, nil | ||
// Job does not have the running condition in status, thus we think | ||
// the job is running when it is created. | ||
log.Info("NestedFieldCopy", "err", "status cannot be found in job") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why is this logic only added for batch job and not for other types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, kubeflow provider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kubeflow provider already implements it before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold Found some problems after rebase. |
Signed-off-by: Ce Gao <gaoce@caicloud.io>
Verified with job: (base) ➜ katib git:(running) ✗ kubectl -n kubeflow get trials -o json | jq ".items[] | {conditions: .status.conditions}"
{
"conditions": [
{
"lastTransitionTime": "2019-12-05T02:26:59Z",
"lastUpdateTime": "2019-12-05T02:26:59Z",
"message": "Trial is created",
"reason": "TrialCreated",
"status": "True",
"type": "Created"
},
{
"lastTransitionTime": "2019-12-05T02:29:27Z",
"lastUpdateTime": "2019-12-05T02:29:27Z",
"message": "Trial is running",
"reason": "TrialRunning",
"status": "False",
"type": "Running"
},
{
"lastTransitionTime": "2019-12-05T02:29:27Z",
"lastUpdateTime": "2019-12-05T02:29:27Z",
"message": "Trial has succeeded",
"reason": "TrialSucceeded",
"status": "True",
"type": "Succeeded"
}
]
}
{
"conditions": [
{
"lastTransitionTime": "2019-12-05T02:26:42Z",
"lastUpdateTime": "2019-12-05T02:26:42Z",
"message": "Trial is created",
"reason": "TrialCreated",
"status": "True",
"type": "Created"
},
{
"lastTransitionTime": "2019-12-05T02:26:43Z",
"lastUpdateTime": "2019-12-05T02:26:43Z",
"message": "Trial is running",
"reason": "TrialRunning",
"status": "True",
"type": "Running"
}
]
}
{
"conditions": [
{
"lastTransitionTime": "2019-12-05T02:26:42Z",
"lastUpdateTime": "2019-12-05T02:26:42Z",
"message": "Trial is created",
"reason": "TrialCreated",
"status": "True",
"type": "Created"
},
{
"lastTransitionTime": "2019-12-05T02:26:59Z",
"lastUpdateTime": "2019-12-05T02:26:59Z",
"message": "Trial is running",
"reason": "TrialRunning",
"status": "True",
"type": "Running"
}
]
} |
Signed-off-by: Ce Gao <gaoce@caicloud.io>
/lgtm |
/retest |
/retest |
/hold cancel |
/hold Forgot to test on TFJob/PyTorchJob |
/hold cancel |
Signed-off-by: Ce Gao <gaoce@caicloud.io>
@@ -118,6 +130,9 @@ func (r *ReconcileTrial) updateFinalizers(instance *trialsv1alpha3.Trial, finali | |||
} | |||
|
|||
func isTrialObservationAvailable(instance *trialsv1alpha3.Trial) bool { | |||
if instance == nil { | |||
return false | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wondering when instance becomes nil ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never, but prefer to keep to avoid potential crashes.
/retest |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnugeorge The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Ce Gao gaoce@caicloud.io
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Release note:
This change is