-
Notifications
You must be signed in to change notification settings - Fork 1
Description
🚀 The feature, motivation and pitch
This is an issue that tracks PRs related to AITER https://github.com/ROCm/aiter .
AITER is AMD’s centralized repository that support various of high performance AI operators for AI workloads acceleration, where a good, unified place for all the customer operator-level requests, which can match different customers' needs. Developers can focus on operators, and let the customers integrate this op collection into their own private/public/whatever framework.
Note: This issue tracker description has been reorganized from the latest to the oldest
Based on AITER commit (20 Aug 2025): 5ee37dced6f1bde0229b2c77ce079433549aa25f549aa25f
Based on AITER commit (12 July 2025): 916bf3c
- [V1] [ROCm] [AITER] Upgrade AITER to commit
916bf3c
and bugfix APIs vllm-project/vllm#20880 - [FEAT] [ROCm] [AITER]: Add AITER HIP block quant kernel vllm-project/vllm#21242
- [ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. vllm-project/vllm#22521
- [ROCm][Aiter] Add triton fp8 bmm kernel for mla vllm-project/vllm#22759
Based on AITER commit:
Based on AITER commit: 636a9f0d56c202040e93b9560c296441b7f77233
- Add weight preshuffled PTPC FP8 GEMM ([ROCm][FEAT] Integrate AITER gemm w8a8 ptpc vllm-project/vllm#19417)
Based on AITER commit: 648764942e552a8bb5fe16026703716a81f05374
- AITER MHA V1 ([Hardware][AMD] integrate aiter chunked prefill into vllm vllm-project/vllm#18596) ([Hardware][AMD] integrate aiter into vllm vllm-project/vllm#17710)
- Patch for new AITER commit ([ROCm] [AITER] [Bugfix] Patch for AITER commit
648764942e552a8bb5fe16026703716a81f05374
vllm-project/vllm#18990) - [Bugfix][V1][ROCm] Fix AITER Flash Attention Backend (Fix API Break and Local Attention Logic: affecting Llama4) vllm-project/vllm#19904
- [ROCm][FEAT] Enable Full Graph Mode in AITER MLA V1 Attn Backend (Decode Phase only) vllm-project/vllm#20254
- [V1] [ROCm] Enable EP with AITER Fused MoE vllm-project/vllm#20270
Enhancement
- Bugfix to enable PP with AITER MLA [Bugfix] Enable PP with AITER+V1 vllm-project/vllm#19822
- Add padding to weight to use block scaled fused moe on Qwen3-235B TP4 ([Bugfix] Add padding for block-scale fused-moe weights for AITER lib vllm-project/vllm#19234)
- [Bugfix][V1][ROCm] Fix AITER Flash Attention Backend (Fix API Break and Local Attention Logic: affecting Llama4) vllm-project/vllm#19904
Based on AITER commit: c1debd87ce0391aa27438d9e07e76e4fea7c4b70
- Fix MLA Backend v0 due to AITER API change in newer version ([BugFix][AMD] Compatible patch for latest AITER(05/07/2025) vllm-project/vllm#17864)
- It has been reverted (Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" vllm-project/vllm#17910) as it introduced new properties that causes pre-commit to fail. The bug fix PR is ([BugFix][AMD] Compatible patch for AITER lib after 04/20 vllm-project/vllm#17912)
- Use AITER fused moe external API ([FEAT] [ROCm] Upgrade AITER Fused MoE kernels. vllm-project/vllm#18271)
- [FEAT][ROCm] Upgrade AITER MLA v1 backend vllm-project/vllm#18338
- [FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 vllm-project/vllm#18825
- Enable full context length of DeepSeekV3 [ROCm] Remove unnecessary assertion of max_model_len in ROCM_AITER_MLA attention backend. vllm-project/vllm#18938
Based on AITER commit: 5a77249
The kernels from vllm-project#14007 has been broken down into the following PRs for ease of review:
- AITER Linear ([FEAT] [ROCm]: Support AITER Linear vllm-project/vllm#14916)
- AITER RMS Norm ([FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature vllm-project/vllm#14959)
- AITER Fused MoE + Block Scaled Fused MoE ([FEAT][ROCm] Integrate Fused MoE Kernels from AITER vllm-project/vllm#14967)
- AITER Block Scaled A8W8 GEMM ([FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature vllm-project/vllm#14968)
- AITER Paged Attention ([FEAT][ROCm] Integrate Paged Attention Kernel from AITER vllm-project/vllm#15001)
- AITER INT8 a8w8 GEMM kernel ([FEAT] [ROCm] Add AITER int8 scaled gemm kernel vllm-project/vllm#15433)
- AITER MLA ([FEAT][ROCm]: Support AITER MLA vllm-project/vllm#15893)
- AITER Tkw1 for Llama4 FP8 ([ROCm] Add aiter tkw1 kernel for Llama4 fp8 vllm-project/vllm#16727)
([ROCm] (Deprecated) Enable AITER Tkw1 kernel vllm-project/vllm#16418) - AITER CK_MoE for Llama4 BF16 ([ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints vllm-project/vllm#16674)
- Enable AITER Fused MoE in V1 Engine ([FEAT] [ROCm]: AITER Fused MOE V1 Support vllm-project/vllm#16752) To be merged after
- AITER Tkw1 ([ROCm] Add aiter tkw1 kernel for Llama4 fp8 vllm-project/vllm#16727)
- AITER CK_MoE for Llama4 ([ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints vllm-project/vllm#16674)
- AITER 2Stage CK MoE [FEAT] [ROCm]: Add AITER CK 2 Stages MoE support vllm-project/vllm#17110
- AITER MLA V1 ([FEAT][ROCm]: Support AITER MLA on V1 Engine vllm-project/vllm#17523)
- AITER biased group topk ([FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 vllm-project/vllm#17955)
Enhancement::
- Restrict Fused MoE based on Model that are actually using the kernel [Misc][ROCm] Restrict Aiter moe to specific models. vllm-project/vllm#16435
- [BugFix] [ROCm]: Bugfix and handle addition case of input for
rocm_aiter_rms_norm
vllm-project/vllm#17857
Bugfix
Archived on 2025-05-14
The kernels from vllm-project#14007 has been broken down into the following PRs for ease of review:
- AITER Linear ([FEAT] [ROCm]: Support AITER Linear vllm-project/vllm#14916)
- AITER RMS Norm ([FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature vllm-project/vllm#14959)
- AITER Fused MoE + Block Scaled Fused MoE ([FEAT][ROCm] Integrate Fused MoE Kernels from AITER vllm-project/vllm#14967)
- AITER Block Scaled A8W8 GEMM ([FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature vllm-project/vllm#14968)
- AITER Paged Attention ([FEAT][ROCm] Integrate Paged Attention Kernel from AITER vllm-project/vllm#15001)
- AITER INT8 a8w8 GEMM kernel ([FEAT] [ROCm] Add AITER int8 scaled gemm kernel vllm-project/vllm#15433)
- AITER MLA ([FEAT][ROCm]: Support AITER MLA vllm-project/vllm#15893)
- AITER Tkw1 for Llama4 FP8 ([ROCm] Add aiter tkw1 kernel for Llama4 fp8 vllm-project/vllm#16727)
([ROCm] (Deprecated) Enable AITER Tkw1 kernel vllm-project/vllm#16418) - AITER CK_MoE for Llama4 BF16 ([ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints vllm-project/vllm#16674)
- Enable AITER Fused MoE in V1 Engine ([FEAT] [ROCm]: AITER Fused MOE V1 Support vllm-project/vllm#16752) To be merged after
- AITER Tkw1 ([ROCm] Add aiter tkw1 kernel for Llama4 fp8 vllm-project/vllm#16727)
- AITER CK_MoE for Llama4 ([ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints vllm-project/vllm#16674)
- AITER 2Stage CK MoE [FEAT] [ROCm]: Add AITER CK 2 Stages MoE support vllm-project/vllm#17110
- AITER MLA V1 ([FEAT][ROCm]: Support AITER MLA on V1 Engine vllm-project/vllm#17523)
- Fix MLA Backend v0 due to AITER API change in newer version ([BugFix][AMD] Compatible patch for latest AITER(05/07/2025) vllm-project/vllm#17864)
- It has been reverted (Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" vllm-project/vllm#17910) as it introduced new properties that causes pre-commit to fail. The bug fix PR is ([BugFix][AMD] Compatible patch for AITER lib after 04/20 vllm-project/vllm#17912)
- AITER MHA V1 ([Hardware][AMD] integrate aiter into vllm vllm-project/vllm#17710)
- AITER biased group topk ([FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 vllm-project/vllm#17955)
Enhancement::
- Restrict Fused MoE based on Model that are actually using the kernel [Misc][ROCm] Restrict Aiter moe to specific models. vllm-project/vllm#16435
- [BugFix] [ROCm]: Bugfix and handle addition case of input for
rocm_aiter_rms_norm
vllm-project/vllm#17857
Bugfix
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.