Support computation pipelining after SWP refactoring #5185

manman-ren · 2024-11-18T21:47:00Z

With the recent SWP refactoring, it is much easier to support arbitrary stage assignments where computations can be separated into different stages. Computation pipelining is basically splitting computations to different stages. Take flash attention as an example:
Currently the two loads are in stage 0 (S0), all other ops are in the last stage (stage 2). The loop body will look like
MMA0(i)
Softmax(i)
MUL(i)
MMA1(i)
LoadV(i+2)
LoadK(i+2)

This patch defines two different pipeline schedule for attention-like kernels:
1> putting first dot in S2, other computations in S3, loadK in stage 0, loadV in stage 1
MMA0(i+1)
Softmax(i)
MUL(i)
MMA1(i)
loadK(i+3)
loadV(i+2)
2> putting second dot in S3, other computations in S2, loadK in stage 0, loadV in stage 1
MMA0(i+1)
MMA1(i)
Softmax(i+1)
MUL(i+1)
loadK(i+3)
loadV(i+2)

Preliminary performance number on H100 for flash attention:
(Batch, Heads, SeqLen, Dhead) triton_tutorial_flash_v2_opt-tflops triton_tutorial_flash_v2_tma-tflops triton_tutorial_flash_v2-tflops

         (8, 16, 8192, 128)                                517.528                                504.565                            481.402

The implementation and the frontend is preliminary for discussion.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

manman-ren · 2024-11-18T21:55:13Z

@pawelszczerbuk The frontend is an annotation on loop, and inside the LoopSchedule pass, we are using the annotation to see if the ttgir matches with the specific schedule, if it does, we perform the corresponding <stage, cluster> assignment.

I understand that you are working on further refactoring and maybe frontend design for specifying a loop schedule. This PR is mostly to share the performance numbers and the preliminary implementation. Happy to work together on enabling this!

manman-ren added 2 commits November 11, 2024 19:43

[CompPipe] 3 patches

c463ff5

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fix compPipe

05925bd

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

manman-ren requested a review from ptillet as a code owner November 18, 2024 21:47

manman-ren marked this pull request as draft November 18, 2024 21:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support computation pipelining after SWP refactoring #5185

Support computation pipelining after SWP refactoring #5185

manman-ren commented Nov 18, 2024

manman-ren commented Nov 18, 2024

Support computation pipelining after SWP refactoring #5185

Are you sure you want to change the base?

Support computation pipelining after SWP refactoring #5185

Conversation

manman-ren commented Nov 18, 2024

manman-ren commented Nov 18, 2024