Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit Job creation failed event #448

Merged
merged 2 commits into from
Mar 14, 2024

Conversation

danielvegamyhre
Copy link
Contributor

Fixes #447

I opted to only emit 1 event if any Job creation fails, rather than emit a separate event for each Job failure (since there could be many).

While unconventional, I included the error message in the event message since the intent is to allow the user to quickly see why Jobs aren't being created without digging through logs.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danielvegamyhre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 13, 2024
@k8s-ci-robot k8s-ci-robot requested review from ahg-g and kannon92 March 13, 2024 17:53
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 13, 2024
Copy link

netlify bot commented Mar 13, 2024

Deploy Preview for kubernetes-sigs-jobset canceled.

Name Link
🔨 Latest commit 54f3809
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-jobset/deploys/65f330616f911800084824bd

@@ -475,6 +475,9 @@ func (r *JobSetReconciler) createJobs(ctx context.Context, js *jobset.JobSet, ow
}
allErrs := errors.Join(finalErrs...)
if allErrs != nil {
// Emit event to propagate the Job creation failures up to be more visible to the user.
// TODO(#422): Investigate ways to validate Job templates at JobSet validation time.
r.Record.Eventf(js, corev1.EventTypeWarning, "JobCreationFailed", "Job creation(s) failed with error: %s", allErrs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a bit more information to this event? If we had multiple jobs, which job creation failed? I guess validation would be in logs or kubetctl somewhere so they could dig into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the allErrs variable should have these details

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that events actually shows the errors in all its entirity?

Copy link
Contributor Author

@danielvegamyhre danielvegamyhre Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be safe, I updated the PR and wrapped the err returned by r.Create(...) in a custom error which includes the Job name at the beginning of the error message, let me know what you think of this

Copy link
Contributor

@kannon92 kannon92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold

maybe @ahg-g wants to weigh in.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 14, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 14, 2024
@ahg-g
Copy link
Contributor

ahg-g commented Mar 14, 2024

/lgtm

@danielvegamyhre
Copy link
Contributor Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 14, 2024
@k8s-ci-robot k8s-ci-robot merged commit e17f679 into kubernetes-sigs:main Mar 14, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Emit event when Job creation fails
4 participants