-
Notifications
You must be signed in to change notification settings - Fork 686
Open
Labels
roadmapTracks features, enhancements, or milestones planned as part of the project roadmapTracks features, enhancements, or milestones planned as part of the project roadmap
Description
Hi Dynamo developers!
We wanted to provide visibility into the near-term roadmap for the Dynamo v0.5.1 and Dynamo v0.6.0 releases. Please refer to the long term H2 roadmap here.
We are contributing to make progress on the five major focus areas:
- Performance
- Fault tolerance
- K8 deployment
- KV cache management and transfer
- Scheduling with smart router and planner
📅 Timeline
The target dates for the releases are below:
| v0.5.1 | v0.6.0 |
|---|---|
| 10/8 | 10/23 |
Please note the date change for 0.6.0 from previously planned target date, 10/22.
Dynamo v0.5.1. Features
1. Performance
- Develop reproducible k8 benchmark script for DS-R1 with disaggregated serving and wide EP using TRT-LLM and SGLang backends.
- Allow in-cluster performance benchmark launched with one linear kubectl command.
2. Fault Tolerance & Observability
Fault Tolerance
- Request cancellation at earliest possible time to avoid extra computation.
Observability
- Add num_request_waiting, finished/succ/failure metrics to frontend
- Add max request count for TRT-LLM and vLLM.
3. K8s Deployment
- Create standardized K8 single and multi node examples across SGLang, TRT-LLM, and vLLM.
4. KV Cache Management & Transfer
KV Block Manager
Note: G1 = HBM, G2 = Host memory, G3 = Local disk, G4 = Remote storage
- Performant TRT-LLM and vLLM G1 to G2 offloading tested with multi qa benchmark comparing TTFT vs QPS
5. Planning & Routing
Router
- Allow custom routing logic in Python
- Routing benchmark script
- Router robustness:
- Router should not update internal states on probe (Envoy AI Gateway integration) feat: don't modify kv scheduler states on query + more python binding #2798
- Router should force expire requests that are too stale (replica robustness) fix: router slot manager needs force expire requests #2840
- Do not purge Jetstream too carelessly (state persistence) fix: do not delete KV events jetstream #2800
Planner
- SLA planner MoE support for SGLang
- SLA Planner integration support for TRT-LLM (Dense models)
- Virtual connector for the Dynamo planner that writes profiling decisions to an intermediate JSON file
- AIConfigurator and Planner integration to search for best offline configs as starting value for Planner
Dynamo v0.6.0. Features
1. Performance
- Develop reproducible benchmark script for Qwen3 family focusing on highlighting benefits of disaggregating Qwen3 32B model.
2. Fault Tolerance & Observability
Fault tolerance
- Engine health checks
- Request cancellation for TRT-LLM and SGLang
- K8s fault injection tests
Observability
- Open telemetry support
- Frontend metric for kv cache usage
- Engine metrics for SGLang, TRT-LLM, vLLM
- NSight integration example
- TRT-LLM Logging
- Node HW metrics for CPU
3. K8s Deployment
- ETCD and NATS dependency removal
- Dynamo pre-deployment checks
4. KV Cache Management & Transfer
KV Block Manager
- G2 - G3 offloading perf optimization
- Disaggregated serving with KV offloading
- Support for CUDA Graphs
5. Planning & Routing
Router
- Router awareness for prefill workers and their loads (SGLang and TRT-LLM)
Planner
- SLA planner MoE support for TRT-LLM and vLLM.
6. Other
- Create container for frontend
If there are any additional features that needs to be considered or prioritized, please let us know in the comment. Thank you so much for your ongoing feedback, and we will do our best to incorporate them to GA Dynamo in December 🙏.
Describe the problem you're encountering
N/A
Describe alternatives you've tried
N/A
qimcis and jonathanc-n
Metadata
Metadata
Assignees
Labels
roadmapTracks features, enhancements, or milestones planned as part of the project roadmapTracks features, enhancements, or milestones planned as part of the project roadmap