Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PIPELINER] Cleanup of LoopScheduling.cpp, introduction of AssignLatencies #5176

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

pawelszczerbuk
Copy link
Contributor

This change breaks down LoopScheduling into two sub-passes: latency assignment and actual scheduling.
Latency assignment is a transformation that analyzes the loop and based on the requested number of stages it assigns "latencies" to the ops that are going to be converted to async ops by the pipeliner. Latencies are expressed in terms of number of iterations of the loop and can be thought as per-operation num_stages.
Scheduling transformation takes these latencies and builds a pipeliner schedule based on it. The process of building a schedule was slightly rewritten to simplify the code and cleanup the logic that was no longer needed after recent refactoring.
Breaking down the schedule into latency assignment and proper scheduling has number of purposes:

  1. Code became more modular, with cleaner interfaces that helps with maintanance
  2. Both parts can be tested in separation, I have added lit tests for both pieces. We can finally test our pipeliner infrastructure in manageable chunks
  3. It opens up opportunity to expose per-op "latencies" to the frontend, enabling creating user-defined schedules right from the language level

Next step in the cleanup process is to clearly separate lowering and pipelining phases.

@pawelszczerbuk
Copy link
Contributor Author

  • I am to run full internal tests for this, there may be some follow-up changes in case any major issues are found

Copy link
Collaborator

@manman-ren manman-ren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning up!

It opens up opportunity to expose per-op "latencies" to the frontend, enabling creating user-defined schedules right from the language level

I wonder how you plan to expose the per-op "latencies" to the frontend.

I plan to send out a follow-up PR for computation pipelining soon, sorry for the delay on my side because of other priorities. The part on how to let a user define a pipeline schedule needs more discussion. If you already have a plan, that will be great!

}
}

// Other IfOps should be pushed to the end.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schedulePrologueAndEpilogue is simplified here. It used to have a more complicated logic around root users to put IfOps in epilogueCluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, at some point we have realized with @ThomasRaoux that it can be simplified without loosing functionality.

namespace {

// Return true if the preconditions for pipelining the loop are met.
bool preCondition(scf::ForOp forOp) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SoftwarePipeliner.cpp has the same helper function, wondering if it makes sense to put this in a common place. But the content of this helper is short and the preCondition may diverge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my thinking too, at some point there may be different preCoditions for different pieces, so I kept them separate, just refactored the internals so it more modular.

iter = loadOpToIndLevel.erase(iter);
else
++iter;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section was updated to

  auto it = llvm::remove_if(loadOpToIndLevelAndUse, [=](auto op) {
    return std::get<1>(op) >= numStages - 1;
  });
  loadOpToIndLevelAndUse.erase(it, loadOpToIndLevelAndUse.end());

Wondering why we are going back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Manman, I might have copied it over from before your latest update

// We only schedule ops that are downstream of a latency op
// (had a non-negative distance due to a latency op).
if (dist >= 0)
opToStage[op] = maxDistance - dist;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how we can make sure "maxDistance - dist" is less than or equal to numStages - 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is up to assignLatencies to ensure this is the case. The algorithm in assignLatencies calculates the longest (latency-wise) path through the ops, and distributing stages between them, so this should be always the case.

return CoarseSchedule(0);

// Compute the longest path to the yield for each operation reachable
// from any latency operation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When defining the longest path, do ops have latency 0 if they are not in opLatency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants