[Feature] [ROCm]: AITER Kernel Integration

### 🚀 The feature, motivation and pitch

This is an issue that tracks PRs related to AITER https://github.com/ROCm/aiter .

_AITER is AMD’s centralized repository that support various of high performance AI operators for AI workloads acceleration, where a good, unified place for all the customer operator-level requests, which can match different customers' needs. Developers can focus on operators, and let the customers integrate this op collection into their own private/public/whatever framework._

*Note*: This issue tracker description has been reorganized from the latest to the oldest

## Based on AITER commit (20 Aug 2025): `5ee37dced6f1bde0229b2c77ce079433549aa25f549aa25f`
* [ ] https://github.com/vllm-project/vllm/pull/23336

## Based on AITER commit (12 July 2025): `916bf3c`
* [x] https://github.com/vllm-project/vllm/pull/20880
* [x] https://github.com/vllm-project/vllm/pull/21242
* [x] https://github.com/vllm-project/vllm/pull/22521
* [ ] https://github.com/vllm-project/vllm/pull/22759

## Based on AITER commit: ` `
* [x] https://github.com/vllm-project/vllm/pull/20295

## Based on AITER commit: `636a9f0d56c202040e93b9560c296441b7f77233`
* [ ] Add weight preshuffled PTPC FP8 GEMM (https://github.com/vllm-project/vllm/pull/19417)

## Based on AITER commit: `648764942e552a8bb5fe16026703716a81f05374 `
* [x] AITER MHA V1 (https://github.com/vllm-project/vllm/pull/18596) (https://github.com/vllm-project/vllm/pull/17710)
* [x] Patch for new AITER commit (https://github.com/vllm-project/vllm/pull/18990)
* [x] https://github.com/vllm-project/vllm/pull/19904
* [x] https://github.com/vllm-project/vllm/pull/20254
* [x] https://github.com/vllm-project/vllm/pull/20270

### Enhancement
* [x] Bugfix to enable PP with AITER MLA https://github.com/vllm-project/vllm/pull/19822
* [ ] Add padding to weight to use block scaled fused moe on Qwen3-235B TP4 (https://github.com/vllm-project/vllm/pull/19234)
* [x] https://github.com/vllm-project/vllm/pull/19904


## Based on AITER commit: `c1debd87ce0391aa27438d9e07e76e4fea7c4b70`
* [x] Fix MLA Backend v0 due to AITER API change in newer version (https://github.com/vllm-project/vllm/pull/17864)
  * [x] It has been reverted (https://github.com/vllm-project/vllm/pull/17910) as it introduced new properties that causes pre-commit to fail. The bug fix PR is (https://github.com/vllm-project/vllm/pull/17912)
* [x] Use AITER fused moe external API (https://github.com/vllm-project/vllm/pull/18271)
* [x] https://github.com/vllm-project/vllm/pull/18338
* [x] https://github.com/vllm-project/vllm/pull/18825
* [x] Enable full context length of DeepSeekV3 https://github.com/vllm-project/vllm/pull/18938

## Based on AITER commit: `5a77249`
The kernels from https://github.com/vllm-project/vllm/pull/14007 has been broken down into the following PRs for ease of review:
* [ ] AITER Linear (https://github.com/vllm-project/vllm/pull/14916)
* [x] AITER RMS Norm (https://github.com/vllm-project/vllm/pull/14959)
* [x] AITER Fused MoE + Block Scaled Fused MoE (https://github.com/vllm-project/vllm/pull/14967)
* [x] AITER Block Scaled A8W8 GEMM (https://github.com/vllm-project/vllm/pull/14968)
* [x] AITER Paged Attention (https://github.com/vllm-project/vllm/pull/15001)
* [x] AITER INT8 a8w8 GEMM kernel (https://github.com/vllm-project/vllm/pull/15433)
* [x] AITER MLA (https://github.com/vllm-project/vllm/pull/15893)
* [x] AITER Tkw1 for Llama4 FP8 (https://github.com/vllm-project/vllm/pull/16727) ~(https://github.com/vllm-project/vllm/pull/16418)~
* [x] AITER CK_MoE for Llama4 BF16 (https://github.com/vllm-project/vllm/pull/16674)
* [x] Enable AITER Fused MoE in V1 Engine (https://github.com/vllm-project/vllm/pull/16752) To be merged after
    * [x]  AITER Tkw1 (https://github.com/vllm-project/vllm/pull/16727)
    * [x] AITER CK_MoE for Llama4 (https://github.com/vllm-project/vllm/pull/16674)
* [x] AITER 2Stage CK MoE https://github.com/vllm-project/vllm/pull/17110
* [x] AITER MLA V1 (https://github.com/vllm-project/vllm/pull/17523)
* [x] AITER biased group topk (https://github.com/vllm-project/vllm/pull/17955)

### Enhancement::
- [x] Restrict Fused MoE based on Model that are actually using the kernel https://github.com/vllm-project/vllm/pull/16435 
- [x] https://github.com/vllm-project/vllm/pull/17857

### Bugfix
- [x] https://github.com/vllm-project/vllm/pull/17961






<details>
    <summary> Archived on 2025-05-14 </summary>

The kernels from https://github.com/vllm-project/vllm/pull/14007 has been broken down into the following PRs for ease of review:
* [ ] AITER Linear (https://github.com/vllm-project/vllm/pull/14916)
* [x] AITER RMS Norm (https://github.com/vllm-project/vllm/pull/14959)
* [x] AITER Fused MoE + Block Scaled Fused MoE (https://github.com/vllm-project/vllm/pull/14967)
* [x] AITER Block Scaled A8W8 GEMM (https://github.com/vllm-project/vllm/pull/14968)
* [x] AITER Paged Attention (https://github.com/vllm-project/vllm/pull/15001)
* [x] AITER INT8 a8w8 GEMM kernel (https://github.com/vllm-project/vllm/pull/15433)
* [x] AITER MLA (https://github.com/vllm-project/vllm/pull/15893)
* [x] AITER Tkw1 for Llama4 FP8 (https://github.com/vllm-project/vllm/pull/16727) ~(https://github.com/vllm-project/vllm/pull/16418)~
* [x] AITER CK_MoE for Llama4 BF16 (https://github.com/vllm-project/vllm/pull/16674)
* [x] Enable AITER Fused MoE in V1 Engine (https://github.com/vllm-project/vllm/pull/16752) To be merged after
    * [x]  AITER Tkw1 (https://github.com/vllm-project/vllm/pull/16727)
    * [x] AITER CK_MoE for Llama4 (https://github.com/vllm-project/vllm/pull/16674)
* [ ] AITER 2Stage CK MoE https://github.com/vllm-project/vllm/pull/17110
* [x] AITER MLA V1 (https://github.com/vllm-project/vllm/pull/17523)
* [ ] Fix MLA Backend v0 due to AITER API change in newer version (https://github.com/vllm-project/vllm/pull/17864)
  * [ ] It has been reverted (https://github.com/vllm-project/vllm/pull/17910) as it introduced new properties that causes pre-commit to fail. The bug fix PR is (https://github.com/vllm-project/vllm/pull/17912)
* [ ] AITER MHA V1 (https://github.com/vllm-project/vllm/pull/17710)
* [x] AITER biased group topk (https://github.com/vllm-project/vllm/pull/17955)

### Enhancement::
- [x] Restrict Fused MoE based on Model that are actually using the kernel https://github.com/vllm-project/vllm/pull/16435 
- [x] https://github.com/vllm-project/vllm/pull/17857

### Bugfix
- [ ] https://github.com/vllm-project/vllm/pull/17961

</details>

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] [ROCm]: AITER Kernel Integration #51

🚀 The feature, motivation and pitch

Based on AITER commit (20 Aug 2025): `5ee37dced6f1bde0229b2c77ce079433549aa25f549aa25f`

Based on AITER commit (12 July 2025): `916bf3c`

Based on AITER commit:

Based on AITER commit: `636a9f0d56c202040e93b9560c296441b7f77233`

Based on AITER commit: `648764942e552a8bb5fe16026703716a81f05374`

Enhancement

Based on AITER commit: `c1debd87ce0391aa27438d9e07e76e4fea7c4b70`

Based on AITER commit: `5a77249`

Enhancement::

Bugfix

Enhancement::

Bugfix

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature] [ROCm]: AITER Kernel Integration #51

Description

🚀 The feature, motivation and pitch

Based on AITER commit (20 Aug 2025): 5ee37dced6f1bde0229b2c77ce079433549aa25f549aa25f

Based on AITER commit (12 July 2025): 916bf3c

Based on AITER commit:

Based on AITER commit: 636a9f0d56c202040e93b9560c296441b7f77233

Based on AITER commit: 648764942e552a8bb5fe16026703716a81f05374

Enhancement

Based on AITER commit: c1debd87ce0391aa27438d9e07e76e4fea7c4b70

Based on AITER commit: 5a77249

Enhancement::

Bugfix

Enhancement::

Bugfix

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Based on AITER commit (20 Aug 2025): `5ee37dced6f1bde0229b2c77ce079433549aa25f549aa25f`

Based on AITER commit (12 July 2025): `916bf3c`

Based on AITER commit: `636a9f0d56c202040e93b9560c296441b7f77233`

Based on AITER commit: `648764942e552a8bb5fe16026703716a81f05374`

Based on AITER commit: `c1debd87ce0391aa27438d9e07e76e4fea7c4b70`

Based on AITER commit: `5a77249`