AMD port of TurboDiffusion - Working on gfx1151 on Windows by jammm · Pull Request #66 · thu-ml/TurboDiffusion

jammm · 2025-12-30T13:23:20Z

Use rocWMMA for GEMM kernels, and use triton-windows and SpargeAttn modified to support AMD on Windows.
See README_AMD_WINDOWS.md for setup steps.

Generated video using Wan2.1 1.4b 480p default command as per README.md:

generated_video.mp4

Limitations:

Currently supports only RDNA3/3.5, though can possibly work on RDNA4 with minor modificaitons.
multi-cta/distributed not tested

bat3a · 2026-01-05T17:32:01Z

3 Install Dependencies

pip install -r requirements.txt
in which repo?

jammm · 2026-01-05T17:34:51Z

3 Install Dependencies

pip install -r requirements.txt in which repo?

In Turbodiffusion repo (on this PR branch)

bat3a · 2026-01-05T19:41:55Z

3 Install Dependencies
pip install -r requirements.txt in which repo?

In Turbodiffusion repo (on this PR branch)

couldnt git clone this branch, so i modified the files by hand
git clone --branch jam/windows_amd https://github.com/woct0rdho/triton-windows.git triton-windows-patch Cloning into 'triton-windows-patch'... fatal: Remote branch jam/windows_amd not found in upstream origin

jammm · 2026-01-05T19:44:44Z

3 Install Dependencies
pip install -r requirements.txt in which repo?

In Turbodiffusion repo (on this PR branch)

couldnt git clone this branch, so i modified the files by hand git clone --branch jam/windows_amd https://github.com/woct0rdho/triton-windows.git triton-windows-patch Cloning into 'triton-windows-patch'... fatal: Remote branch jam/windows_amd not found in upstream origin

ah, for triton-windows you just need pip install triton-windows. I'll update the README to mention this. My bad

githust66 · 2026-01-06T05:43:48Z

The step “pip install -r requirements.txt” should not be necessary, as the TurboDiffusion project does not contain a requirements.txt file; only the SpargeAttn project includes such a file.

0xDELUXA · 2026-01-06T11:13:08Z

...though can possibly work on RDNA4 with minor modificaitons.

Sounds promising 👀

jammm · 2026-01-06T11:34:36Z

The step “pip install -r requirements.txt” should not be necessary, as the TurboDiffusion project does not contain a requirements.txt file; only the SpargeAttn project includes such a file.

Fixed, thanks!

...though can possibly work on RDNA4 with minor modificaitons.

Sounds promising 👀

Yes, it's just a matter of refactoring the rocWMMA code to not assume that the per-thread matrix fragments are replicated across the half-waves. It's just another prompt to claude actually.

- Add HIP kernels for GEMM, LayerNorm, RMSNorm, and quantization ops - Integrate rocWMMA for matrix operations on AMD GPUs - Update setup.py for Windows ROCm builds with clang-cl - Add platform detection (CUDA/HIP) with common abstractions - Optimize SLA kernel config for ROCm (BLKK=16) - Update .gitignore to exclude build artifacts and IDE files - Fix distributed utils and network files for ROCm compatibility

0xDELUXA · 2026-01-07T13:02:43Z

Yes, it's just a matter of refactoring the rocWMMA code to not assume that the per-thread matrix fragments are replicated across the half-waves. It's just another prompt to claude actually.

I'll be the 1st tester though, when the time comes ^^

This was referenced Dec 30, 2025

AMD Port of SpargeAttn - Working on windows for gfx1151 thu-ml/SpargeAttn#108

Draft

Add Windows/clang-cl support for AMD HIP backend woct0rdho/triton-windows#179

Merged

jammm force-pushed the jam/amd_windows branch 2 times, most recently from d87745b to 9a7e801 Compare January 5, 2026 11:40

jammm force-pushed the jam/amd_windows branch from 0ae67c3 to d947d6d Compare January 5, 2026 21:23

jammm force-pushed the jam/amd_windows branch from d947d6d to 55a7ae6 Compare January 6, 2026 11:20

jammm and others added 4 commits January 6, 2026 17:05

Add AMD Windows setup guide

ffce8b9

Add numeric_conversion_hip header

4b7674c

Update README_AMD_WINDOWS.md

6199e67

jammm force-pushed the jam/amd_windows branch from 55a7ae6 to 6199e67 Compare January 6, 2026 11:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMD port of TurboDiffusion - Working on gfx1151 on Windows#66

AMD port of TurboDiffusion - Working on gfx1151 on Windows#66
jammm wants to merge 4 commits intothu-ml:mainfrom
jammm:jam/amd_windows

jammm commented Dec 30, 2025 •

edited

Loading

Uh oh!

bat3a commented Jan 5, 2026

Uh oh!

jammm commented Jan 5, 2026

Uh oh!

bat3a commented Jan 5, 2026

Uh oh!

jammm commented Jan 5, 2026

Uh oh!

githust66 commented Jan 6, 2026

Uh oh!

0xDELUXA commented Jan 6, 2026

Uh oh!

jammm commented Jan 6, 2026

Uh oh!

0xDELUXA commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jammm commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bat3a commented Jan 5, 2026

Uh oh!

jammm commented Jan 5, 2026

Uh oh!

bat3a commented Jan 5, 2026

Uh oh!

jammm commented Jan 5, 2026

Uh oh!

githust66 commented Jan 6, 2026

Uh oh!

0xDELUXA commented Jan 6, 2026

Uh oh!

jammm commented Jan 6, 2026

Uh oh!

0xDELUXA commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jammm commented Dec 30, 2025 •

edited

Loading