-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include first failed job name in event emitted when JobSet fails, as well as the JobSet failure condition #477
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danielvegamyhre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-jobset canceled.
|
@@ -916,3 +914,42 @@ func findReplicatedStatus(replicatedJobStatus []jobset.ReplicatedJobStatus, repl | |||
} | |||
return jobset.ReplicatedJobStatus{} | |||
} | |||
|
|||
// messageWithFirstFailedJob appends the first failed job to the original event message in human readable way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move these out of this file?
They don't require the controller so it may be useful to move them to a separate file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah jobset_controller.go is overdue for refactoring. To start I think we can refactor some of the many helper functions into separate files based on the feature (similar to what you did with startup policy).
I did some refactoring in this PR (e.g. moving some functions into success_policy.go, adding a constants pkg, etc.)
However, for these particular functions, I'm not sure of the best place to put them yet. They are about finding the first failed job for a Jobset and generating an event message for it, which doesn't fit into any existing (or new) logical grouping.
I think for now we should leave these 3 functions here and maybe in a separate PR we can refactor some more, I don't want to go overboard splitting things up.
0381ab5
to
4d1bc06
Compare
"sigs.k8s.io/jobset/pkg/util/collections" | ||
) | ||
|
||
// TODO: add unit tests for the functions in this file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want this done as part of this PR? or a TODO for others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create a ticket to do it in follow up PR once this is merged so the file exists in main and the issue can link to it
/hold I have a minor nit around constants naming but otherwise this looks good to me. |
/retest |
All tests are passing locally but the build fails in e2e CI? odd... |
Not sure why we did this but it bit us again... https://github.com/kubernetes-sigs/jobset/blob/main/Dockerfile Docker file is copying elements one by one in pkg. |
Ahhhh not again... this dockerfile will be the death of me. Fixing. Let's just change it to pkg/, the only unnecessary part of pkg/ is the pkg/testing and it's just one file. |
/hold cancel |
@kannon92 I fixed the dockerfile but lgtm was removed because of this, can you take another look please? |
/lgtm |
Fixes #466