-
LLM Serving Runtimes
-
LLM Autoscaling
-
LLM RAG/Agent Pipeline Orchestration
- Support declarative RAG/Agent workflow using KServe Inference Graph [#3829].
-
Open Inference Protocol extension to GenAI Task APIs
- Community-maintained Open Inference Protocol repo for OpenAI schema [https://docs.google.com/document/d/1odTMdIFdm01CbRQ6CpLzUIGVppHSoUvJV_zwcX6GuaU].
- Support vertical GenAI Task APIs such as embedding, Text-to-Image, Text-To-Code, Doc-To-Text [#3572].
-
LLM Gateway
- Support multiple LLM providers.
- Support token based rate limiting.
- Support LLM router with traffic shaping, fallback, load balancing.
- LLM Gateway observability for metrics and cost reporting
-
Promote
InferenceService
andClusterServingRuntime
/ServingRuntime
CRD to v1- Improve
InferenceService
CRD for REST/gRPC protocol interface - Improve model storage interface
- Deprecate
TrainedModel
CRD and add multiple model support for co-hosting, draft model, LoRA adapters to InferenceService. - Improve YAML UX for predictor and transformer container collocation.
- Close the feature gap between
RawDeployment
andServerless
mode.
- Improve
-
Open Inference Protocol
- Support batching for v2 inference protocol
- Transformer and Explainer v2 inference protocol interoperability
- Improve codec for v2 inference protocol
Reference: Control plane issues, Data plane issues,Serving Runtime issues.
- Create standardized model packaging API
- Improve KServe model server observability with metrics and distributed tracing
- Support batch inference
Reference:Python SDK issues, Storage issues
- Improve
InferenceGraph
spec for replica and concurrency control - Support distributed tracing
- Support gRPC for
InferenceGraph
- Standalone
Transformer
support forInferenceGraph
- Support traffic mirroring node
- Improve
RawDeployment
mode forInferenceGraph
Reference: InferenceGraph issues
- Document KServe ServiceMesh setup with mTLS
- Support programmatic authentication token
- Implement per service level auth
- Add support for SPIFFE/SPIRE identity integration with
InferenceService
Reference: Auth related issues
- Add ModelMesh docs and explain the use cases for classic KServe and ModelMesh
- Unify the data plane v1 and v2 page formats
- Improve v2 data plane docs to tell the story why and what changed
- Clean up the examples in kserve repo and unify them with the website's by creating one source of truth for documentation
- Update any out-of-date documentation and make sure the website as a whole is consistent and cohesive