-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surface errors when a PodTemplate or a ProvReq is invalid #3025
Comments
In jobset, we did look at https://github.com/kubernetes-sigs/kubectl-validate to help with validation of these fields. Not sure if this would be helpful here as you could try creating these objects and if they fail they you bubble of the error as a condition or event. |
The error could come from a webhook, for which kubectl-validate wouldn't help. |
+1 for the feature The starting point could be to record any ProvReq creation errors here, or a level up errors in the check's message. |
/assign |
I see the event PR, but wondering if this is enough. In particular, events are temporary objects, so it is not clear if an admin would notice them. OTOH the ProvReq creation is most likely re-attempted, so the event will be generated continuously, so should be easy to notice. We could also explore the option of exposing the error as a status in the kueue/apis/kueue/v1beta1/workload_types.go Line 249 in e971646
Any opinions on that? |
Yes, a status message is more important. An event is nice-to-have. |
/reopen Due to still not updating workload AdmissionChecks status on AdmissionRequest creation error. |
@mbobrovskyi: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What would you like to be added:
When a ProvReq or PodTemplate is rejected by a webhook, Kueue just logs the error and continues retrying. These errors will not be visible to end-users and they might just interpret them as "kueue is stuck". We should communicate these errors in the Workload object, maybe even produce an event?
Why is this needed:
A cloud provider could have a webhook to validate PodTemplates created for ProvisioningRequests.
These errors need to be surfaced to users so they can fix any problem about them.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: