Skip to content

Commit

Permalink
Update KServe 2024-2025 Roadmap (kserve#3810)
Browse files Browse the repository at this point in the history
* Update ROADMAP.md

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Add llm gateway

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Update ROADMAP.md

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Update ROADMAP.md

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

---------

Signed-off-by: Dan Sun <dsun20@bloomberg.net>
  • Loading branch information
yuzisun authored Aug 21, 2024
1 parent 87cf2cd commit e82beb6
Showing 1 changed file with 37 additions and 26 deletions.
63 changes: 37 additions & 26 deletions ROADMAP.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,38 @@
# KServe 2023 Roadmap
# KServe 2024-2025 Roadmap
## Objective: "Support GenAI inference"
- LLM Serving Runtimes
* Support Speculative Decoding with vLLM runtime [https://github.com/kserve/kserve/issues/3800].
* Support LoRA adapters [https://github.com/kserve/kserve/issues/3750].
* Support LLM Serving runtimes for TensorRT-LLM, TGI and provide benchmarking comparisons [https://github.com/kserve/kserve/issues/3868].
* Support multi-host, multi-GPU inference runtime [https://github.com/kserve/kserve/issues/2145].

- LLM Autoscaling
* Support Model Caching with automatic PV/PVC provisioning [https://github.com/kserve/kserve/issues/3869].
* Support Autoscaling settings for serving runtimes.
* Support Autoscaling based on custom metrics [https://github.com/kserve/kserve/issues/3561].

- LLM RAG/Agent Pipeline Orchestration
* Support declarative RAG/Agent workflow using KServe Inference Graph [https://github.com/kserve/kserve/issues/3829].

- Open Inference Protocol extension to GenAI Task APIs
* Community-maintained Open Inference Protocol repo for OpenAI schema [https://docs.google.com/document/d/1odTMdIFdm01CbRQ6CpLzUIGVppHSoUvJV_zwcX6GuaU].
* Support vertical GenAI Task APIs such as embedding, Text-to-Image, Text-To-Code, Doc-To-Text [https://github.com/kserve/kserve/issues/3572].

- LLM Gateway
* Support multiple LLM providers.
* Support token based rate limiting.
* Support LLM router with traffic shaping, fallback, load balancing.
* LLM Gateway observability for metrics and cost reporting

## Objective: "Graduate core inference capability to stable/GA"
- Promote `InferenceService` and `ClusterServingRuntime`/`ServingRuntime` CRD from v1beta1 to v1
- Promote `InferenceService` and `ClusterServingRuntime`/`ServingRuntime` CRD to v1
* Improve `InferenceService` CRD for REST/gRPC protocol interface
* Unify model storage spec and implementation between KServe and ModelMesh
* Add Status to `ServingRuntime` for both ModelMesh and KServe, surface `ServingRuntime` validation errors and deployment status
* Deprecate `TrainedModel` CRD and use `InferenceService` annotation to allow dynamic model updates as alternative option to storage initializer
* Collocate transformer and predictor in the pod to reduce sidecar resources and networking latency
* Stablize `RawDeployment` mode with comprehensive testing for supported features

- All model formats to support v2 inference protocol including custom serving runtime
* TorchServe to support v2 gRPC inference protocol
* Improve model storage interface
* Deprecate `TrainedModel` CRD and add multiple model support for co-hosting, draft model, LoRA adapters to InferenceService.
* Improve YAML UX for predictor and transformer container collocation.
* Close the feature gap between `RawDeployment` and `Serverless` mode.

- Open Inference Protocol
* Support batching for v2 inference protocol
* Transformer and Explainer v2 inference protocol interoperability
* Improve codec for v2 inference protocol
Expand All @@ -19,30 +41,19 @@ Reference: [Control plane issues](https://github.com/kserve/kserve/issues?q=is%3

## Objective: "Graduate KServe Python SDK to 1.0“

- Improve KServe Python SDK dependency management with Poetry
- Create standarized model packaging API
- Improve KServe model server observability with metrics and distruted tracing
- Create standardized model packaging API
- Improve KServe model server observability with metrics and distributed tracing
- Support batch inference

Reference:[Python SDK issues](https://github.com/kserve/kserve/issues?q=is%3Aissue+is%3Aopen+label%3Akserve%2Fsdk), [Storage issues](https://github.com/kserve/kserve/issues?q=is%3Aissue+is%3Aopen+label%3Akfserving%2Fstorage)

## Objective: "Graduate ModelMesh to beta"
- Support TorchServe ServingRuntime
- Add PVC support and unify storage implementation with KServe
- Add optional ingress for ModelMesh deployments
- Etcd secret security for multi-namespace mode
- Add estimated model size field

Reference: [ModelMesh issues](https://github.com/kserve/modelmesh-serving/issues?page=1&q=is%3Aissue+is%3Aopen)

## Objective: "Graduate InferenceGraph to beta"
## Objective: "Graduate InferenceGraph"
- Improve `InferenceGraph` spec for replica and concurrency control
- Allow setting resource limits per `InferenceGraph`
- Support distributed tracing
- Support gRPC for `InferenceGraph`
- Standalone `Transformer` support for `InferenceGraph`
- Support traffic mirroring node
- Support `RawDeployment` mode for `InferenceGraph`
- Improve `RawDeployment` mode for `InferenceGraph`

Reference: [InferenceGraph issues](https://github.com/kserve/kserve/issues?q=is%3Aissue+is%3Aopen+label%3Akserve%2Finference_graph)

Expand All @@ -58,5 +69,5 @@ Reference: [Auth related issues](https://github.com/kserve/kserve/issues?q=is%3A
- Add ModelMesh docs and explain the use cases for classic KServe and ModelMesh
- Unify the data plane v1 and v2 page formats
- Improve v2 data plane docs to tell the story why and what changed
- Clean up the examples in kserve repo and unify them with the website's by creating one source of truth for example documentation
- Clean up the examples in kserve repo and unify them with the website's by creating one source of truth for documentation
- Update any out-of-date documentation and make sure the website as a whole is consistent and cohesive

0 comments on commit e82beb6

Please sign in to comment.