[Roadmap]: 0.5.1 and 0.6.0 roadmaps and key dates

Hi Dynamo developers! 

We wanted to provide visibility into the near-term roadmap for the Dynamo v0.5.1 and Dynamo v0.6.0 releases. Please refer to the long term H2 roadmap [here](https://github.com/ai-dynamo/dynamo/issues/2486).

We are contributing to make progress on the five major focus areas: 
1. Performance
2. Fault tolerance
3. K8 deployment
4. KV cache management and transfer
5. Scheduling with smart router and planner 


## 📅 Timeline
The target dates for the releases are below: 

| v0.5.1 | v0.6.0 |
| :-------: | :------: | 
| 10/8     | 10/23     | 

Please note the date change for 0.6.0 from previously planned target date, 10/22. 

## Dynamo v0.5.1. Features

### 1. Performance 
* Develop reproducible k8 benchmark script for DS-R1 with disaggregated serving and wide EP using TRT-LLM and SGLang backends.
* Allow in-cluster performance benchmark launched with one linear kubectl command.

### 2. Fault Tolerance & Observability
#### Fault Tolerance
* Request cancellation at earliest possible time to avoid extra computation.

#### Observability 
* Add num_request_waiting, finished/succ/failure metrics to frontend 
* Add max request count for TRT-LLM and vLLM.

### 3. K8s Deployment 
* Create standardized K8 single and multi node examples across SGLang, TRT-LLM, and vLLM.
 
### 4. KV Cache Management & Transfer

#### KV Block Manager

Note: G1 = HBM, G2 = Host memory, G3 = Local disk, G4 = Remote storage

* Performant TRT-LLM and vLLM G1 to G2 offloading tested with multi qa benchmark comparing TTFT vs QPS

### 5. Planning & Routing

#### Router
* Allow custom routing logic in Python
* Routing benchmark script
* Router robustness: 
    * Router should not update internal states on probe (Envoy AI Gateway integration) #2798
    * Router should force expire requests that are too stale (replica robustness) #2840
    * Do not purge Jetstream too carelessly (state persistence) #2800
#### Planner 
* SLA planner MoE support for SGLang
* SLA Planner integration support for TRT-LLM (Dense models) 
* Virtual connector for the Dynamo planner that writes profiling decisions to an intermediate JSON file
* AIConfigurator and Planner integration to search for best offline configs as starting value for Planner

## Dynamo v0.6.0. Features

### 1. Performance 
* Develop reproducible benchmark script for Qwen3 family focusing on highlighting benefits of disaggregating Qwen3 32B model.  

### 2. Fault Tolerance & Observability

#### Fault tolerance
* Engine health checks
* Request cancellation for TRT-LLM and SGLang
* K8s fault injection tests

#### Observability 
* Open telemetry support
* Frontend metric for kv cache usage
* Engine metrics for SGLang, TRT-LLM, vLLM
* NSight integration example 
* TRT-LLM Logging
* Node HW metrics for CPU 

### 3. K8s Deployment 
* ETCD and NATS dependency removal 
* Dynamo pre-deployment checks

### 4. KV Cache Management & Transfer

#### KV Block Manager
* G2 - G3 offloading perf optimization
* Disaggregated serving with KV offloading
* Support for CUDA Graphs 

### 5. Planning & Routing

#### Router
* Router awareness for prefill workers and their loads (SGLang and TRT-LLM)

#### Planner
* SLA planner MoE support for TRT-LLM and vLLM. 

### 6. Other
* Create container for frontend

If there are any additional features that needs to be considered or prioritized, please let us know in the comment. Thank you so much for your ongoing feedback, and we will do our best to incorporate them to GA Dynamo in December 🙏.

### Describe the problem you're encountering

N/A

### Describe alternatives you've tried

N/A

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Roadmap]: 0.5.1 and 0.6.0 roadmaps and key dates #3323

📅 Timeline

Dynamo v0.5.1. Features

1. Performance

2. Fault Tolerance & Observability

Fault Tolerance

Observability

3. K8s Deployment

4. KV Cache Management & Transfer

KV Block Manager

5. Planning & Routing

Router

Planner

Dynamo v0.6.0. Features

1. Performance

2. Fault Tolerance & Observability

Fault tolerance

Observability

3. K8s Deployment

4. KV Cache Management & Transfer

KV Block Manager

5. Planning & Routing

Router

Planner

6. Other

Describe the problem you're encountering

Describe alternatives you've tried

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap]: 0.5.1 and 0.6.0 roadmaps and key dates #3323

Description

📅 Timeline

Dynamo v0.5.1. Features

1. Performance

2. Fault Tolerance & Observability

Fault Tolerance

Observability

3. K8s Deployment

4. KV Cache Management & Transfer

KV Block Manager

5. Planning & Routing

Router

Planner

Dynamo v0.6.0. Features

1. Performance

2. Fault Tolerance & Observability

Fault tolerance

Observability

3. K8s Deployment

4. KV Cache Management & Transfer

KV Block Manager

5. Planning & Routing

Router

Planner

6. Other

Describe the problem you're encountering

Describe alternatives you've tried

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions