-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIEX] Improve iterative scheduling convergence strategy #212
Conversation
We are now expecting the schedulers to have improved enough to handle most regressions on their own. (Some of these improvements in next commits)
Ultimately we will have: createTopDownScoreboard checkResourceConflictsTopDown createBottomUpScoreboard checkResourceConflictsBottomUp Note that createTopDownScoreboard is now extra careful and blocks extra cycles for tiny loops. But this is still an NFC change because that scoreboard is only used to determine the number of resrouces that "stick out".
This allows to bias the depth of some SUnits in the hope of moving them up or down in the scoreboard and avoid resource conflicts without necessarily increasing the latency of the whole region.
MachineInstr *ConflictMI = nullptr; | ||
for (const MachineBundle &B : SuccBundles) { | ||
for (MachineInstr *MI : B.getInstrs()) { | ||
if (HR.getHazardType(Scoreboard, MI->getDesc(), HR.getMemoryBanks(MI), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of MI->getDesc()
, please use *HR.getSelectedAltDescs().getDesc(&MI)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't realise that part was already merged. Will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that would take away the pre condition of no MSPs?
for (int C = FirstBlockedCycle; C <= LastBlockedCycle; ++C) { | ||
Scoreboard[C].blockResources(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this NFC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned it in the commit, in the context of how this was used, this is an NFC.
This prepares for future work where Multi-slot pseudos won't necessarily be materialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This brings a new convergence strategy that can bias the depth of some SUnits. Doing so can shift part of the schedule up or down in the scoreboard and avoid resource conflicts without necessarily increasing the latency of the whole region.
Looking at Add2D, this helps us place the VLD instructions better and avoid bank conflicts with the next iteration of the loop. We can then disable the heuristic in
WAWStickyRegistersEdges
and more aggressively remove WAW edges on status registers. This is what the first commit does.QoR is fine, there are still major regressions coming from
Neg_aie2_1
andHardSigmoidTemplated_int8_0
(same as already mentioned in #183 (comment)), but other regressions are avoided.And those two benchmarks are expected to be tackled in the new post-pipeliner soon.
Better review commit by commit :)