Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
122 commits
Select commit Hold shift + click to select a range
dd9f15d
fix some compile issues with -Werror (#1657)
yuguo68 Dec 17, 2025
26d6f1c
Make setup.py PyPI-compatible by separating git dependencies (#1653)
xiaohuguo2023 Dec 17, 2025
690e590
remove_iris_from_setup (#1644)
amd-ruitang3 Dec 17, 2025
5581246
fix sink error for asm fmha (#1652)
LJ-underdog Dec 17, 2025
27d34df
add guard in case pynccl init failed (#1671)
valarLip Dec 17, 2025
042c4d5
One shot pa (#1670)
fsx950223 Dec 18, 2025
268c664
fix(pa_ps): fix pa_ps_asm .co for gfx950 (#1669)
dbyoung18 Dec 18, 2025
6c483be
modify test_bf16gemm_test (#1678)
amd-ruitang3 Dec 18, 2025
8cb913d
Fix Ruff command in pre-checks (#1675)
Boss2002n Dec 18, 2025
14005b3
fix mha bwd golden perf issue (#1666)
JaxChen29 Dec 18, 2025
6cf7955
topk uplift v1 (#1662)
steamedMantou Dec 19, 2025
f82ad46
fix missing return in mha_bwd (#1688)
yuguo68 Dec 19, 2025
e661bbb
Remove the input parameter "out" in gemm_a4w4 (#1679)
junhaha666 Dec 19, 2025
a6ee2e6
fwd v3 hd192 optimize inst alignment for causal mode (#1663)
shay-li77 Dec 19, 2025
87e1855
fix swa case mismatch (#1694)
JaxChen29 Dec 19, 2025
84bb348
fixing the fp4 gemm tune script Exception caused by tile_m name incon…
hongxiayang Dec 19, 2025
04f800b
CI: Migrate Triton tests to aiter-1gpu-runner (#1690)
gyohuangxin Dec 19, 2025
9fecc00
add ntile 128 for a8 blkQ moe 1 stage (#1695)
zufayu Dec 19, 2025
f48673d
Optimize RoPE in the cases that hdim is small. (#1698)
ruanjm Dec 19, 2025
84840b0
rm garbage from whl (#1696)
amd-ruitang3 Dec 19, 2025
9c1107e
enhance prebuild logic (#1672)
zufayu Dec 19, 2025
0a303e8
LLfp4 qr cap for atom (#1673)
amirumoAMD Dec 19, 2025
9eaed8c
[MLA] MLA conditions rewrite (#1665)
Zzz9990 Dec 20, 2025
4beb056
fix dp causal (#1677)
Zzz9990 Dec 20, 2025
a7d40b2
add two fp4 tune shapes and tuned config (#1687)
hongxiayang Dec 20, 2025
ecf9e1b
Dev/a8w4 and a8w8splitk (#1667)
yadaish Dec 20, 2025
1dcbd71
bf16_gemm_clean_in_kl (#1700)
amd-ruitang3 Dec 20, 2025
43cc8ce
fix tuner (#1701)
valarLip Dec 21, 2025
0d829a5
add gen_fake for 4 gemm operators (#1456)
mqhc2020 Dec 21, 2025
8af3f24
fix llvm issue (#1703)
valarLip Dec 21, 2025
5d055cd
feat: Adaptive topk algorithm selection based on input characteristic…
ClementLinCF Dec 22, 2025
00ed049
fix mha bwd build error (#1705)
JaxChen29 Dec 22, 2025
e277235
fix moe bug when pipever=v1 and nblk=64 (#1707)
lalala-sh Dec 22, 2025
6843777
fix (#1710)
valarLip Dec 22, 2025
1c53fae
[PA] Optimize PA Decode Gluon Performance for BF16/FP16 with KV_BLOCK…
yanguahe Dec 23, 2025
194e374
Fix argument parsing logic when AITER_JIT_DIR is set (#1715)
omoisis-dn Dec 24, 2025
501ea43
fix topk deocde bug in logit value is same (#1716)
steamedMantou Dec 24, 2025
eabf3d0
add fp32 input (#1706)
zufayu Dec 24, 2025
5ddc191
add sampling aot (#1711)
fsx950223 Dec 24, 2025
79506b1
A16w16 tuner fix (#1708)
amd-ruitang3 Dec 24, 2025
e8dda37
Support mqa_logits blocksize is multiple of ChunkK cases (#1674)
sjfeng1999 Dec 24, 2025
12f6582
update (#1723)
Zzz9990 Dec 24, 2025
1e44320
fix ck_moe_stage1 input (#1726)
Zzz9990 Dec 24, 2025
fd0aae8
Update CK to fix compilation error on ROCm 6.4 (#1725)
xudonlyu Dec 24, 2025
eef9dba
Moe fix (#1727)
Zzz9990 Dec 24, 2025
3b89b1a
update prefill fmoe tunner config (#1720)
lalala-sh Dec 24, 2025
1a2173d
use asm_scale in pa test (#1724)
ZhangLirong-amd Dec 25, 2025
43aa5f8
add a8w8 ptpc tuning for tencent deepspeed v3 tp2/tp4 (#1722)
anghostcici Dec 25, 2025
d7e022f
CI: Always upload tuned CSVs in auto tunning pipeline (#1732)
gyohuangxin Dec 25, 2025
ca3f7bd
add gpt-oss gemm config default use triton (#1728)
junhaha666 Dec 25, 2025
f1d4050
fix: use dynamic FP8 type signature to fix MI350 compatibility (#1717)
yanguahe Dec 26, 2025
4a9d8c1
ptpc decode moe opt (#1713)
lalala-sh Dec 26, 2025
d663042
add qk_norm_rope_cache_quant fusion (#1650)
ganyi1996ppo Dec 26, 2025
fde3bb4
add a8w8 blockscale tuning for deepseek v3 tp2/tp4 (#1735)
anghostcici Dec 28, 2025
1ba8365
add a8w8 v3 (#1733)
liyjiang Dec 29, 2025
a839793
[MLA] nhead64 and nhead32 (#1730)
Zzz9990 Dec 29, 2025
0fa0c6d
rm fmoe tuned config where token >= 16384 (#1748)
lalala-sh Dec 29, 2025
36d9786
[Triton] Shaoclee/triton fp4 gemm cat preshuffle (#1656)
k50112113 Dec 29, 2025
fe06368
add groupnorm cuda kernel (#1742)
LiuYinfeng01 Dec 30, 2025
6bdd064
Add FP8 support for batch_prefill with per-tensor quantization (#1649)
poyenc Dec 30, 2025
8f66d84
moe tune config fix (#1752)
lalala-sh Dec 30, 2025
f3e1074
[refractor] mha fwd api refractor (#1719)
minmengdie Dec 30, 2025
eced27a
add dealing with memory access fault in Mp tuner (#1680)
yzhou103 Dec 30, 2025
40307d8
fix torch compile inductor mode bug, add fake op (#1756)
LiuYinfeng01 Dec 30, 2025
f314467
[Triton] add gemm tune check utility function (#1331)
k50112113 Dec 31, 2025
8e782d8
[fea]: reduce_scatter (#1457)
TennyWang1223 Dec 31, 2025
ce6e608
act_mul_quant fp8 fusion and fixed a bug in rmsnorm quantization fp8 …
scxiao Dec 31, 2025
f26aebc
fix kernel repeat loading error (#1759)
minmengdie Dec 31, 2025
2a752d6
[Triton] add triton fused_gemm_a8w8_blockscale_mul_add (#1586)
k50112113 Dec 31, 2025
ebb5723
CI: Optimize and collect op_tests summaries (#1731)
gyohuangxin Jan 1, 2026
2d38b4e
[Triton] fix get_config (#1761)
k50112113 Jan 2, 2026
9b035d6
[Triton] skip_reduce for gemm_afp4wfp4_preshuffle (#1745)
k50112113 Jan 2, 2026
c4dd3a3
topk prefill uplift v1.0 (#1755)
steamedMantou Jan 4, 2026
1ccbc79
moe tunner fix (#1743)
lalala-sh Jan 5, 2026
c45310c
[Fix] Add mutates_args to flash_attn_backward to fix AOTAutograd DDP …
tomjen12 Jan 5, 2026
00d96df
CI: Always output the tests summary (#1768)
gyohuangxin Jan 5, 2026
1924b19
fix dynamic_per_group_scaled_quant_perf_drop dure to amd_buffer_coher…
yzhou103 Jan 5, 2026
3e6b0ef
fix tune error when N is large (#1767)
yzhou103 Jan 6, 2026
6853745
update mla (#1628)
amd-ruitang3 Jan 6, 2026
c779b9d
use a16w4 to replace a4w4 as default policy (#1737)
Zzz9990 Jan 6, 2026
e006dee
mdf_bf16_semaphore_check (#1773)
amd-ruitang3 Jan 6, 2026
c983ad0
New tuning configs (#1770)
omuhamma Jan 6, 2026
e48ee0b
Ps pa upstream (#1772)
fsx950223 Jan 7, 2026
5300681
[Triton] Triton FP8 bloscale GEMM preshuffle (#1764)
k50112113 Jan 7, 2026
41ee059
minor refine (#1780)
valarLip Jan 7, 2026
a44540f
support run hsaco directly (#1449)
fsx950223 Jan 7, 2026
98ce6df
mdf_tune_bf16gemm (#1781)
amd-ruitang3 Jan 7, 2026
51ea57c
[TRITON] Move triton files into respective folders (#1638)
Boss2002n Jan 7, 2026
975e9ed
[TRITON] gluon gemm_a8w8 and gemm_a8w8_preshuffled (#1684)
ahmed-bsod Jan 7, 2026
d6fd159
Update triton gemm tuning config file for deepseekR1-mxfp4 bs64 (#1784)
yichiche Jan 8, 2026
fd1038e
a8w4 moe fix (#1788)
lalala-sh Jan 8, 2026
683e57f
fix kv scale load issue (#1799)
fsx950223 Jan 8, 2026
4463e2e
test minor fix for gfx950 (#1801)
valarLip Jan 9, 2026
7ae89b3
change aiter log level (#1795)
yzhou103 Jan 9, 2026
3880282
maybe fix build (#1786)
tenpercent Jan 9, 2026
b17a9b3
add fake for MLA RoPE operator (#1714)
mqhc2020 Jan 9, 2026
916d667
[Triton] Triton A16WFP4 GEMM prequant (#1777)
k50112113 Jan 9, 2026
cbbd92d
[Triton] Add Fused GEMM A8W8 + Split + Concat Triton Kernel (#1553)
farlukas Jan 9, 2026
8b40048
[Triton] Triton a16w8 gemm preshuffle (#1778)
k50112113 Jan 9, 2026
06d1b6f
[Triton] fix get_config return
k50112113 Jan 9, 2026
3e40d41
fix log rccl version (#1806)
valarLip Jan 11, 2026
e6d8cc6
Move moe tune to csrc (#1790)
yzhou103 Jan 12, 2026
9f1fcce
add a8w8 fp8 ck gemm tune support (#1782)
solinzby1 Jan 12, 2026
f1dea59
Fix fused qk concat cache mla (#1783)
yzhou103 Jan 12, 2026
4698bbe
Optimize the performance of quick_allreduce (#1816)
yanboshao Jan 12, 2026
fecdebb
fix moe fp4 dual prebuild issues (#1775)
zufayu Jan 12, 2026
efe6bd5
Support profiler pa and reduce kernel and remove p99 to use test_comm…
ZhangLirong-amd Jan 12, 2026
6fec70b
[TRITON] Add MoE GEMM a4w4 kernel (#1358)
nsusanto Jan 12, 2026
115cc58
[TRITON] Add a8w8 blockscale MoE (#1483)
nsusanto Jan 12, 2026
084a5a5
update moe tile config (#1810)
lalala-sh Jan 13, 2026
867cd7b
[Fix] fix the mha fwd_v3 segment fault in torch.compile(mode="reduce-…
minmengdie Jan 13, 2026
8177662
update AITER_ASM_DIR (#1812)
amd-ruitang3 Jan 13, 2026
bdcfb87
Add device guard for hipbsolgemm kernel launch (#1824)
sammysun0711 Jan 13, 2026
ce95543
support atom pa ps reduce shape (#1820)
ZhangLirong-amd Jan 13, 2026
18b2652
CI: Add fmoe in auto tuning pipeline (#1827)
gyohuangxin Jan 13, 2026
1f8dcc9
[FMHA] Support Vectorized KV Cache Layout and vLLM/SGLang block table…
Jeff-Huang Jan 13, 2026
0cd5de9
[TRITON] Add script to select Triton tests based on diff content (#1682)
brunomazzottiamd Jan 13, 2026
25a5ea0
CI: Fix Triton tests on main branch (#1828)
gyohuangxin Jan 13, 2026
053ef0d
Fix INT4 QR TP8 boundary condition (#1834)
azaidy Jan 14, 2026
53aaad3
sync 3rdparty/composable_kernel to main branch c9f112b
zhuyuhua-v Jan 14, 2026
fd3a9dc
minor fix for moe_ck2stages
zhuyuhua-v Jan 14, 2026
bd3c163
fix: export AiterDistEnv and fix undefined args.dtype
zhyajie Jan 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 4 additions & 0 deletions .github/scripts/build_aiter_triton.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ pip install --upgrade pandas zmq einops numpy==1.26.2
pip uninstall -y aiter || true
pip install --upgrade "pybind11>=3.0.1"
pip install --upgrade "ninja>=1.11.1"
pip install tabulate
python3 setup.py develop

# Read BUILD_TRITON env var, default to 1. If 1, install Triton; if 0, skip installation.
Expand All @@ -25,6 +26,9 @@ if [[ "$BUILD_TRITON" == "1" ]]; then
cd triton
pip install -r python/requirements.txt
pip install filecheck
# NetworkX is a dependency of Triton test selection script
# `.github/scripts/select_triton_tests.py`.
pip install networkx
MAX_JOBS=64 pip --retries=10 --default-timeout=60 install .
cd ..
else
Expand Down
69 changes: 69 additions & 0 deletions .github/scripts/collect_logs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#!/usr/bin/env python3
import re
import sys
from pathlib import Path


def extract_markdown_blocks(path: Path):
"""
Extract markdown blocks from a log file.
The blocks are defined as:
[aiter] <operator> summary (markdown):
| ... |
| ... |
...
"""

start_pattern = re.compile(r"^\[aiter\]\s+.*summary\s*\(markdown\):")
table_line_pattern = re.compile(r"^\|")
blocks = []

with path.open("r", encoding="utf-8", errors="ignore") as f:
in_block = False
current_block = []

for line in f:
stripped = line.rstrip("\n")

if not in_block:
if start_pattern.match(stripped):
in_block = True
current_block = [stripped]
continue
else:
if table_line_pattern.match(stripped):
current_block.append(stripped)
continue
else:
blocks.append(current_block)
in_block = False
current_block = []

if in_block and current_block:
blocks.append(current_block)

return blocks


def main():
if len(sys.argv) < 2:
print("Usage: collect_logs.py <log_file>", file=sys.stderr)
sys.exit(1)

log_path = Path(sys.argv[1])

if not log_path.exists():
print(f"File not found: {log_path}", file=sys.stderr)
sys.exit(1)

blocks = extract_markdown_blocks(log_path)

for i, block in enumerate(blocks):
for line in block:
print(line)
if i != len(blocks) - 1:
print()


if __name__ == "__main__":
main()
1 change: 1 addition & 0 deletions .github/scripts/op_tune.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ declare -a tune_jobs=(
"ck_gemm_a8w8_blockscale:csrc/ck_gemm_a8w8_blockscale:op_tests/test_gemm_a8w8_blockscale.py:python3 csrc/ck_gemm_a8w8_blockscale/gemm_a8w8_blockscale_tune.py -i aiter/configs/a8w8_blockscale_untuned_gemm.csv -o aiter/configs/a8w8_blockscale_tuned_gemm.csv"
"ck_gemm_a8w8_blockscale_bpreshuffle:csrc/ck_gemm_a8w8_blockscale_bpreshuffle:op_tests/test_gemm_a8w8_blockscale.py:python3 csrc/ck_gemm_a8w8_blockscale_bpreshuffle/gemm_a8w8_blockscale_bpreshuffle_tune.py -i aiter/configs/a8w8_blockscale_bpreshuffle_untuned_gemm.csv -o aiter/configs/a8w8_blockscale_bpreshuffle_tuned_gemm.csv"
"ck_gemm_a8w8_bpreshuffle:csrc/ck_gemm_a8w8_bpreshuffle:op_tests/test_gemm_a8w8.py:python3 csrc/ck_gemm_a8w8_bpreshuffle/gemm_a8w8_bpreshuffle_tune.py -i aiter/configs/a8w8_bpreshuffle_untuned_gemm.csv -o aiter/configs/a8w8_bpreshuffle_tuned_gemm.csv"
"ck_gemm_moe_2stages_codegen:csrc/ck_gemm_moe_2stages_codegen:op_tests/test_moe.py:python3 csrc/ck_gemm_moe_2stages_codegen/gemm_moe_tune.py -i aiter/configs/untuned_fmoe.csv -o aiter/configs/tuned_fmoe.csv"
#"ck_gemm_a4w4_blockscale:csrc/ck_gemm_a4w4_blockscale:op_tests/test_gemm_a4w4_blockscale.py:python3 csrc/ck_gemm_a4w4_blockscale/gemm_a4w4_blockscale_tune.py -i aiter/configs/a4w4_blockscale_untuned_gemm.csv -o aiter/configs/a4w4_blockscale_tuned_gemm.csv"
)

Expand Down
Loading