-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Open
Labels
feature requestNew feature or requestNew feature or request
Description
🚀 The feature, motivation and pitch
We have basically support Batch Invariant based on https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/
Batch-invariant Inference (view)
But there are still some work to be done, so here is the issue to track the work
TODOs:
- Basic framework Kernel-override Determinism [1/n] #25603 @bwasti
- Flashinfer support [unrevert] Add batch invariant kernel override for FlashInfer backend [2/n] #26373 @bwasti
- Deepseek-v3 Deepseek-v3 Batch Invariant on 8xH100 #26609 @bwasti
- DeepGEMM on Blackwell [Feature] Batch Invariant: Support DeepGEMM and Blackwell #27127 @yewentao256
- Batch Invariant for R1 TP 8 on Blackwell [Feature] Batch Invariant for R1 TP 8 on Blackwell #27229 @yewentao256
- Torch compile & Cuda Graph support [Feature] Batch invariant torch.compile #27660 @PaulZhang12
- Usability & Documentation @bwasti Batch invariance doc #27839
- Accelerate batch invariant triton kernels @bwasti
- an RL example @bwasti https://github.com/bwasti/spirl
- Adds Batch invariant tests to CI [CI] Add batch invariant test to ci #27842 @yewentao256
Nice to have:
- NVFP4 support
- Cutlass support
- AMD testing/support
- Speculative decoding support (this might be hard)
- vLLM Support for Generic Model Definitions @bwasti [RFC]: vLLM Support for Generic Model Definitions #28326
And currently, the performance of batch invariant mode is still not that good, let's optimize it together if you have a free hand!
yeqcharlotte, yewentao256, oOTWK, Cppowboy, quanliu1991 and 4 moreamith-ananthram, anxiang1836, yewentao256 and fixerivanhaitwang-cloud and fixerivan
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request
Type
Projects
Status
In Progress