Skip to content

[Roadmap]: 0.5.1 and 0.6.0 roadmaps and key dates #3323

@harryskim

Description

@harryskim

Hi Dynamo developers!

We wanted to provide visibility into the near-term roadmap for the Dynamo v0.5.1 and Dynamo v0.6.0 releases. Please refer to the long term H2 roadmap here.

We are contributing to make progress on the five major focus areas:

  1. Performance
  2. Fault tolerance
  3. K8 deployment
  4. KV cache management and transfer
  5. Scheduling with smart router and planner

📅 Timeline

The target dates for the releases are below:

v0.5.1 v0.6.0
10/8 10/23

Please note the date change for 0.6.0 from previously planned target date, 10/22.

Dynamo v0.5.1. Features

1. Performance

  • Develop reproducible k8 benchmark script for DS-R1 with disaggregated serving and wide EP using TRT-LLM and SGLang backends.
  • Allow in-cluster performance benchmark launched with one linear kubectl command.

2. Fault Tolerance & Observability

Fault Tolerance

  • Request cancellation at earliest possible time to avoid extra computation.

Observability

  • Add num_request_waiting, finished/succ/failure metrics to frontend
  • Add max request count for TRT-LLM and vLLM.

3. K8s Deployment

  • Create standardized K8 single and multi node examples across SGLang, TRT-LLM, and vLLM.

4. KV Cache Management & Transfer

KV Block Manager

Note: G1 = HBM, G2 = Host memory, G3 = Local disk, G4 = Remote storage

  • Performant TRT-LLM and vLLM G1 to G2 offloading tested with multi qa benchmark comparing TTFT vs QPS

5. Planning & Routing

Router

Planner

  • SLA planner MoE support for SGLang
  • SLA Planner integration support for TRT-LLM (Dense models)
  • Virtual connector for the Dynamo planner that writes profiling decisions to an intermediate JSON file
  • AIConfigurator and Planner integration to search for best offline configs as starting value for Planner

Dynamo v0.6.0. Features

1. Performance

  • Develop reproducible benchmark script for Qwen3 family focusing on highlighting benefits of disaggregating Qwen3 32B model.

2. Fault Tolerance & Observability

Fault tolerance

  • Engine health checks
  • Request cancellation for TRT-LLM and SGLang
  • K8s fault injection tests

Observability

  • Open telemetry support
  • Frontend metric for kv cache usage
  • Engine metrics for SGLang, TRT-LLM, vLLM
  • NSight integration example
  • TRT-LLM Logging
  • Node HW metrics for CPU

3. K8s Deployment

  • ETCD and NATS dependency removal
  • Dynamo pre-deployment checks

4. KV Cache Management & Transfer

KV Block Manager

  • G2 - G3 offloading perf optimization
  • Disaggregated serving with KV offloading
  • Support for CUDA Graphs

5. Planning & Routing

Router

  • Router awareness for prefill workers and their loads (SGLang and TRT-LLM)

Planner

  • SLA planner MoE support for TRT-LLM and vLLM.

6. Other

  • Create container for frontend

If there are any additional features that needs to be considered or prioritized, please let us know in the comment. Thank you so much for your ongoing feedback, and we will do our best to incorporate them to GA Dynamo in December 🙏.

Describe the problem you're encountering

N/A

Describe alternatives you've tried

N/A

Metadata

Metadata

Assignees

Labels

roadmapTracks features, enhancements, or milestones planned as part of the project roadmap

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions