-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Kserve #1603
Comments
I think I talked about this with @astefanutti |
How do you envision that working? Can you list a couple of CUJs? |
Online inference service is somehow latency sensitive, scalability is highly required, reclaim/preempt the kserve managed services looks not right. I guess Kserve is not that good at offline inference, which in my mind maybe helpful. cc @terrytangyuan |
I would like to see possible support for this as I am looking for a unified way of managing resources for both model training and serving and Kueue looks like it has this capability. In our case, both training and serving are running in the same cluster. And how it can integrate with the recent |
I imagined that the similar approach as RayCluster. So, I would like to add Suspend field to InferenceService resource. |
@kerthcet I believe that lending limit would allow us to guarantee capacities for latency sensitive Workloads. |
Yes, that's right. Actually, I also deploy Job and Inference Server into a single cluster. |
Let me try to design this integrations. /assign |
Thanks! Great to see this. Looking forward to your proposal. @tenzen-y |
I will create a dedicated issue later in Kserve side as well. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
I do not have enough bandwidth. /unasign |
/unassign |
FYI: we have similar requirements for serving projects but mostly for resource fungibility rather quota managements(although related as a whole). |
Yeah, I fully agree with you. Actually, we wanted to provide the resource fungibility in the mixed type workload clusters. But, I found that we can fulfill this motivation by using Kueue Pod integration with a little bit of Knative tuning. I will try to add Documentation on how to perform Kserve ISVC with Kueue. |
@tenzen-y It would be great if you could share some docs and examples on that! |
Sure, I will share it in the Kueue documentation. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
We are looking to add a section in the docs to show using pod integration with external frameworks. is an example of using Pod Integration with TektonPipelines. @varshaprasad96 has done some exploring on how to use kubeflow notebooks with the statefulset integration. |
I'm doing a few tests with the latest version of Kueue for NB integration. Will open an upstream PR to add docs soon. |
What would you like to be added:
I would like to support the serverless ML Inference tool, Kserve.
Why is this needed:
In the hybrid workload (which means training jobs and inference servers and so on) cluster, users often want to manage all cluster capacities by the kueue's flavorQuotas. So, as the first step to support the inference server, supporting Kserve in kueue is nice to have.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
We will probably implement
suspend
semantics on the Kserve side.Additionally, we need to move #77 forward together to support the inference server's autoscaling semantics.
The text was updated successfully, but these errors were encountered: