-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PIPELINER] Cleanup of LoopScheduling.cpp, introduction of AssignLatencies #5176
base: main
Are you sure you want to change the base?
[PIPELINER] Cleanup of LoopScheduling.cpp, introduction of AssignLatencies #5176
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cleaning up!
It opens up opportunity to expose per-op "latencies" to the frontend, enabling creating user-defined schedules right from the language level
I wonder how you plan to expose the per-op "latencies" to the frontend.
I plan to send out a follow-up PR for computation pipelining soon, sorry for the delay on my side because of other priorities. The part on how to let a user define a pipeline schedule needs more discussion. If you already have a plan, that will be great!
} | ||
} | ||
|
||
// Other IfOps should be pushed to the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schedulePrologueAndEpilogue is simplified here. It used to have a more complicated logic around root users to put IfOps in epilogueCluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, at some point we have realized with @ThomasRaoux that it can be simplified without loosing functionality.
namespace { | ||
|
||
// Return true if the preconditions for pipelining the loop are met. | ||
bool preCondition(scf::ForOp forOp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SoftwarePipeliner.cpp has the same helper function, wondering if it makes sense to put this in a common place. But the content of this helper is short and the preCondition may diverge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my thinking too, at some point there may be different preCoditions for different pieces, so I kept them separate, just refactored the internals so it more modular.
iter = loadOpToIndLevel.erase(iter); | ||
else | ||
++iter; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section was updated to
auto it = llvm::remove_if(loadOpToIndLevelAndUse, [=](auto op) {
return std::get<1>(op) >= numStages - 1;
});
loadOpToIndLevelAndUse.erase(it, loadOpToIndLevelAndUse.end());
Wondering why we are going back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Manman, I might have copied it over from before your latest update
// We only schedule ops that are downstream of a latency op | ||
// (had a non-negative distance due to a latency op). | ||
if (dist >= 0) | ||
opToStage[op] = maxDistance - dist; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how we can make sure "maxDistance - dist" is less than or equal to numStages - 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is up to assignLatencies to ensure this is the case. The algorithm in assignLatencies calculates the longest (latency-wise) path through the ops, and distributing stages between them, so this should be always the case.
return CoarseSchedule(0); | ||
|
||
// Compute the longest path to the yield for each operation reachable | ||
// from any latency operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When defining the longest path, do ops have latency 0 if they are not in opLatency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the idea
This change breaks down LoopScheduling into two sub-passes: latency assignment and actual scheduling.
Latency assignment is a transformation that analyzes the loop and based on the requested number of stages it assigns "latencies" to the ops that are going to be converted to async ops by the pipeliner. Latencies are expressed in terms of number of iterations of the loop and can be thought as per-operation num_stages.
Scheduling transformation takes these latencies and builds a pipeliner schedule based on it. The process of building a schedule was slightly rewritten to simplify the code and cleanup the logic that was no longer needed after recent refactoring.
Breaking down the schedule into latency assignment and proper scheduling has number of purposes:
Next step in the cleanup process is to clearly separate lowering and pipelining phases.