Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Webhook Validation #1993

Closed
6 tasks done
Tracked by #1994
tenzen-y opened this issue Jan 19, 2024 · 7 comments · Fixed by #2058
Closed
6 tasks done
Tracked by #1994

Introduce Webhook Validation #1993

tenzen-y opened this issue Jan 19, 2024 · 7 comments · Fixed by #2058
Assignees

Comments

@tenzen-y
Copy link
Member

tenzen-y commented Jan 19, 2024

As described in #1708 (comment), we can not introduce CEL validations for fields under the .spec.replicaSpec due to the CEL cost budget violations. The root cause is that our replicaSpec is defined as a typed map, and then Kubernetes can not estimate how many maximum replicas will be held in the .spec.replicas.

Also, current validations aren't better for UX since the users can find validation errors only in the training-operator logs.

So, I would suggest introducing the webhook validations with internal certs mechanisms like Katib since validations help to find validation errors for users and the internal cert mechanisms could reduce installation costs.

/kind feature

@tenzen-y
Copy link
Member Author

tenzen-y commented Mar 2, 2024

/assign

@tenzen-y
Copy link
Member Author

I'm wondering if we should not implement the webhook validations for the MXJob because we will remove the MXJob.

@kubeflow/wg-training-leads WDYT?

@tenzen-y
Copy link
Member Author

I'm wondering if we should not implement the webhook validations for the MXJob because we will remove the MXJob.

@kubeflow/wg-training-leads WDYT?

Based on the last community call, we decided not to implement webhook validations for MXJob, but we will implement webhook warnings to notify deprecations.

@andreyvelich
Copy link
Member

Yes, that's correct @tenzen-y!
Let's just add the warning for users via webhook.

@tenzen-y
Copy link
Member Author

@andreyvelich @johnugeorge For MPIJob v1, I'm wondering if we can skip to implement webhooks.
WDYT?

@andreyvelich
Copy link
Member

I am fine to not add them since we will be focus on V2 for the long-term

@tenzen-y
Copy link
Member Author

Thank you for checking this.
So, I will skip the webhook implementations for the MPIJob v1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants