Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for mxjob status bug fix #1650

Closed
alison-lim opened this issue Aug 11, 2022 · 7 comments
Closed

Request for mxjob status bug fix #1650

alison-lim opened this issue Aug 11, 2022 · 7 comments

Comments

@alison-lim
Copy link

Hi.
I want to request for mxjob status logic using training operator.
When I create a training job with a simple command, two job's condition status are created at the same time.
In fact, the job is completed, but it seems be Running when I check using 'kubectl get mxjob'
My guess is because the Running status is aligned to the bottom.
Could you fix this issue as soon as possible?
Please reply.

API Version : kubeflow.org/v1
Kubeflow Version : 1.5

Status:
Conditions:
Last Transition Time: 2022-07-22T06:46:55Z
Last Update Time: 2022-07-22T06:46:55Z
Message: MXJob mxtrain1 is created.
Reason: MXJobCreated
Status: True
Type: Created
Last Transition Time: 2022-07-22T06:53:06Z
Last Update Time: 2022-07-22T06:53:06Z
Message: MXJob mxtrain1 is successfully completed.
Reason: MXJobSucceeded
Status: True
Type: Succeeded
Last Transition Time: 2022-07-22T06:53:06Z
Last Update Time: 2022-07-22T06:53:06Z
Message: MXJob mxtrain1 is running.
Reason: MXJobRunning
Status: True
Type: Running

$ kubectl get mxjobs -n admin-0001
NAME STATE AGE
mxtrain1 Running 18d

@alison-lim
Copy link
Author

Is there anyone who can reply?

@johnugeorge
Copy link
Member

Is this always reproducible?

@alison-lim
Copy link
Author

alison-lim commented Oct 21, 2022

yes we solved this problem by fixing the clusterrole file.
my clusterrole "kubeflow-admin / kubeflow-edit" was not including mxjobs in kubeflow.org resources.

@johnugeorge
Copy link
Member

This was fixed in latest release #1565

@alison-lim
Copy link
Author

sorry, it was different issue. It is still happening in my cluster.

Status:
Conditions:
Last Transition Time: 2022-11-21T03:21:20Z
Last Update Time: 2022-11-21T03:21:20Z
Message: MXJob mxnet-job is created.
Reason: MXJobCreated
Status: True
Type: Created
Last Transition Time: 2022-11-21T03:27:06Z
Last Update Time: 2022-11-21T03:27:06Z
Message: MXJob mxnet-job is successfully completed.
Reason: MXJobSucceeded
Status: True
Type: Succeeded
Last Transition Time: 2022-11-21T03:27:06Z
Last Update Time: 2022-11-21T03:27:06Z
Message: MXJob mxnet-job is running.
Reason: MXJobRunning
Status: True
Type: Running
Replica Statuses:
Scheduler:
Succeeded: 1
Server:
Succeeded: 1
Worker:
Succeeded: 1

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions
Copy link

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants