[PIPELINER] Cleanup of LoopScheduling.cpp, introduction of AssignLatencies #5176

pawelszczerbuk · 2024-11-15T23:12:58Z

This change breaks down LoopScheduling into two sub-passes: latency assignment and actual scheduling.
Latency assignment is a transformation that analyzes the loop and based on the requested number of stages it assigns "latencies" to the ops that are going to be converted to async ops by the pipeliner. Latencies are expressed in terms of number of iterations of the loop and can be thought as per-operation num_stages.
Scheduling transformation takes these latencies and builds a pipeliner schedule based on it. The process of building a schedule was slightly rewritten to simplify the code and cleanup the logic that was no longer needed after recent refactoring.
Breaking down the schedule into latency assignment and proper scheduling has number of purposes:

Code became more modular, with cleaner interfaces that helps with maintanance
Both parts can be tested in separation, I have added lit tests for both pieces. We can finally test our pipeliner infrastructure in manageable chunks
It opens up opportunity to expose per-op "latencies" to the frontend, enabling creating user-defined schedules right from the language level

Next step in the cleanup process is to clearly separate lowering and pipelining phases.

… tests

pawelszczerbuk · 2024-11-15T23:13:58Z

I am to run full internal tests for this, there may be some follow-up changes in case any major issues are found

manman-ren

Thanks for cleaning up!

It opens up opportunity to expose per-op "latencies" to the frontend, enabling creating user-defined schedules right from the language level

I wonder how you plan to expose the per-op "latencies" to the frontend.

I plan to send out a follow-up PR for computation pipelining soon, sorry for the delay on my side because of other priorities. The part on how to let a user define a pipeline schedule needs more discussion. If you already have a plan, that will be great!

manman-ren · 2024-11-18T05:42:23Z

lib/Dialect/TritonGPU/Transforms/LoopScheduling.cpp

+    }
+  }
+
+  // Other IfOps should be pushed to the end.


schedulePrologueAndEpilogue is simplified here. It used to have a more complicated logic around root users to put IfOps in epilogueCluster.

Yes, at some point we have realized with @ThomasRaoux that it can be simplified without loosing functionality.

manman-ren · 2024-11-18T05:45:00Z

lib/Dialect/TritonGPU/Transforms/Pipeliner/AssignLatencies.cpp

+namespace {
+
+// Return true if the preconditions for pipelining the loop are met.
+bool preCondition(scf::ForOp forOp) {


SoftwarePipeliner.cpp has the same helper function, wondering if it makes sense to put this in a common place. But the content of this helper is short and the preCondition may diverge.

That's my thinking too, at some point there may be different preCoditions for different pieces, so I kept them separate, just refactored the internals so it more modular.

manman-ren · 2024-11-18T05:51:13Z

lib/Dialect/TritonGPU/Transforms/Pipeliner/AssignLatencies.cpp

+        iter = loadOpToIndLevel.erase(iter);
+      else
+        ++iter;
+    }


This section was updated to

auto it = llvm::remove_if(loadOpToIndLevelAndUse, [=](auto op) { return std::get<1>(op) >= numStages - 1; }); loadOpToIndLevelAndUse.erase(it, loadOpToIndLevelAndUse.end());

Wondering why we are going back.

Thanks Manman, I might have copied it over from before your latest update

manman-ren · 2024-11-18T05:57:11Z

lib/Dialect/TritonGPU/Transforms/LoopScheduling.cpp

+    // We only schedule ops that are downstream of a latency op
+    // (had a non-negative distance due to a latency op).
+    if (dist >= 0)
+      opToStage[op] = maxDistance - dist;


I wonder how we can make sure "maxDistance - dist" is less than or equal to numStages - 1.

It is up to assignLatencies to ensure this is the case. The algorithm in assignLatencies calculates the longest (latency-wise) path through the ops, and distributing stages between them, so this should be always the case.

manman-ren · 2024-11-18T06:00:12Z

lib/Dialect/TritonGPU/Transforms/LoopScheduling.cpp

+    return CoarseSchedule(0);
+
+  // Compute the longest path to the yield for each operation reachable
+  // from any latency operation.


When defining the longest path, do ops have latency 0 if they are not in opLatency?

Yes, that's the idea

pawelszczerbuk added 14 commits November 7, 2024 10:51

Getting starteg with cleaned up schedule pass

012a5b6

Initial testing for assignLatencies

41c1772

Reverting test changes

2b09978

.

187cea5

Assign latencies implemented and tested

2c4b4bd

Started to work on scheduleKeyOps

14e613e

Merge branch 'main' into pawel/pipeline_schedule

178cb86

Testing the loop scheduling

02373e4

Scheduling implementation done. Debugging regressions to existing lit…

a41b25a

… tests

New scheduler passing lit and triton tests

9abbc45

Replacing old scheduler with the new implementation

906b985

Cleaning up the code

f223016

Cleanup

fc92884

Merge branch 'main' into pawel/pipeline_schedule

34cf7ed

pawelszczerbuk requested review from ThomasRaoux and manman-ren November 15, 2024 23:12

pawelszczerbuk requested a review from ptillet as a code owner November 15, 2024 23:12

Removing pipelining perf warning test

ffd5422

manman-ren reviewed Nov 18, 2024

View reviewed changes

Fixing discrepancy between scheduling and pipelining pass

522babb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PIPELINER] Cleanup of LoopScheduling.cpp, introduction of AssignLatencies #5176

[PIPELINER] Cleanup of LoopScheduling.cpp, introduction of AssignLatencies #5176

pawelszczerbuk commented Nov 15, 2024

pawelszczerbuk commented Nov 15, 2024

manman-ren left a comment

manman-ren Nov 18, 2024

pawelszczerbuk Nov 19, 2024

manman-ren Nov 18, 2024

pawelszczerbuk Nov 19, 2024

manman-ren Nov 18, 2024

pawelszczerbuk Nov 19, 2024

manman-ren Nov 18, 2024

pawelszczerbuk Nov 19, 2024

manman-ren Nov 18, 2024

pawelszczerbuk Nov 19, 2024

[PIPELINER] Cleanup of LoopScheduling.cpp, introduction of AssignLatencies #5176

Are you sure you want to change the base?

[PIPELINER] Cleanup of LoopScheduling.cpp, introduction of AssignLatencies #5176

Conversation

pawelszczerbuk commented Nov 15, 2024

pawelszczerbuk commented Nov 15, 2024

manman-ren left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment