AMD port of TurboDiffusion - Working on gfx1151 on Windows#66
AMD port of TurboDiffusion - Working on gfx1151 on Windows#66jammm wants to merge 4 commits intothu-ml:mainfrom
Conversation
d87745b to
9a7e801
Compare
|
3 Install Dependencies pip install -r requirements.txt |
In Turbodiffusion repo (on this PR branch) |
couldnt git clone this branch, so i modified the files by hand |
ah, for |
|
The step “pip install -r requirements.txt” should not be necessary, as the TurboDiffusion project does not contain a requirements.txt file; only the SpargeAttn project includes such a file. |
Sounds promising 👀 |
Fixed, thanks!
Yes, it's just a matter of refactoring the rocWMMA code to not assume that the per-thread matrix fragments are replicated across the half-waves. It's just another prompt to claude actually. |
- Add HIP kernels for GEMM, LayerNorm, RMSNorm, and quantization ops - Integrate rocWMMA for matrix operations on AMD GPUs - Update setup.py for Windows ROCm builds with clang-cl - Add platform detection (CUDA/HIP) with common abstractions - Optimize SLA kernel config for ROCm (BLKK=16) - Update .gitignore to exclude build artifacts and IDE files - Fix distributed utils and network files for ROCm compatibility
I'll be the 1st tester though, when the time comes ^^ |
Use rocWMMA for GEMM kernels, and use triton-windows and SpargeAttn modified to support AMD on Windows.
See README_AMD_WINDOWS.md for setup steps.
Generated video using Wan2.1 1.4b 480p default command as per README.md:
generated_video.mp4
Limitations: