[Feature]: Batch Invariant Feature and Performance Optimization

### 🚀 The feature, motivation and pitch

We have basically support Batch Invariant based on https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/

https://github.com/orgs/vllm-project/projects/29/views/1

But there are still some work to be done, so here is the issue to track the work

TODOs:

- [x] Basic framework https://github.com/vllm-project/vllm/pull/25603 @bwasti 
- [x] Flashinfer support https://github.com/vllm-project/vllm/pull/26373 @bwasti 
- [x] Deepseek-v3 https://github.com/vllm-project/vllm/pull/26609 @bwasti 
- [x] DeepGEMM on Blackwell https://github.com/vllm-project/vllm/pull/27127  @yewentao256 
- [x] Batch Invariant for R1 TP 8 on Blackwell https://github.com/vllm-project/vllm/pull/27229 @yewentao256 
- [x] Torch compile & Cuda Graph support  https://github.com/vllm-project/vllm/pull/27660  @PaulZhang12 
- [x] Usability & Documentation @bwasti  https://github.com/vllm-project/vllm/pull/27839
- [ ] Accelerate batch invariant triton kernels  @bwasti 
- [x] an RL example @bwasti  https://github.com/bwasti/spirl
- [ ] Adds Batch invariant tests to CI https://github.com/vllm-project/vllm/pull/27842 @yewentao256 

Nice to have:

- [ ] NVFP4 support
- [ ] Cutlass support
- [ ] AMD testing/support
- [ ] Speculative decoding support (this might be hard)
- [ ] vLLM Support for Generic Model Definitions @bwasti https://github.com/vllm-project/vllm/issues/28326


And currently, the performance of batch invariant mode is still not that good, let's optimize it together if you have a free hand!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Batch Invariant Feature and Performance Optimization #27433

🚀 The feature, motivation and pitch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Batch Invariant Feature and Performance Optimization #27433

Description

🚀 The feature, motivation and pitch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions