[AIEX] Improve iterative scheduling convergence strategy #212

gbossu · 2024-10-15T08:48:51Z

This brings a new convergence strategy that can bias the depth of some SUnits. Doing so can shift part of the schedule up or down in the scoreboard and avoid resource conflicts without necessarily increasing the latency of the whole region.

Looking at Add2D, this helps us place the VLD instructions better and avoid bank conflicts with the next iteration of the loop. We can then disable the heuristic in WAWStickyRegistersEdges and more aggressively remove WAW edges on status registers. This is what the first commit does.

QoR is fine, there are still major regressions coming from Neg_aie2_1 and HardSigmoidTemplated_int8_0 (same as already mentioned in #183 (comment)), but other regressions are avoided.
And those two benchmarks are expected to be tackled in the new post-pipeliner soon.

| Core_Compute_Cycle_Count   | Neg_aie2_1    | HardSigmoidTemplated_int8_0 |     | Add2D_0      | Add2D_Standalone_0 | Add2D_Standalone_1 |     | Shrink_aie2_0 | Tanh_int8_0  | TanhTemplated_aie2_int8 | Tanh_int8_1  | Add2D_1      | HardSigmoidTemplated_bf16_0 | Average diff |
| -------------------------- | ------------- | --------------------------- | --- | ------------ | ------------------ | ------------------ | --- | ------------- | ------------ | ----------------------- | ------------ | ------------ | --------------------------- | ------------ |
| Baseline                   | 343(+0.00%)   | 257(+0.00%)                 | ... | 217(+0.00%)  | 322(+0.00%)        | 482(+0.00%)        | ... | 672(+0.00%)   | 349(+0.00%)  | 310(+0.00%)             | 421(+0.00%)  | 466(+0.00%)  | 617(+0.00%)                 | +0.00%       |
| Disabling heursitics       | 463(+34.99%)  | 284(+10.51%)                | ... | 230(+5.99%)  | 335(+4.04%)        | 511(+6.02%)        | ... | 672(+0.00%)   | 339(-2.87%)  | 300(-3.23%)             | 407(-3.33%)  | 466(+0.00%)  | 556(-9.89%)                 | +0.12%       |
| New convergence heuristic  | 462(-0.22%)   | 284(+0.00%)                 |     | 217(-5.65%)  | 322(-3.88%)        | 482(-5.68%)        |     | 658(-2.08%)   | 339(+0.00%)  | 300(+0.00%)             | 407(+0.00%)  | 434(-6.87%)  | 556(+0.00%)                 | -0.07%       |
| Overall diff               | REGR(+34.69%) | REGR(+10.51%)               |     | SAME(+0.00%) | SAME(+0.00%)       | SAME(+0.00%)       |     | IMPR(-2.08%)  | IMPR(-2.87%) | IMPR(-3.23%)            | IMPR(-3.33%) | IMPR(-6.87%) | IMPR(-9.89%)                | +0.05%       |

Better review commit by commit :)

We are now expecting the schedulers to have improved enough to handle most regressions on their own. (Some of these improvements in next commits)

Ultimately we will have: createTopDownScoreboard checkResourceConflictsTopDown createBottomUpScoreboard checkResourceConflictsBottomUp Note that createTopDownScoreboard is now extra careful and blocks extra cycles for tiny loops. But this is still an NFC change because that scoreboard is only used to determine the number of resrouces that "stick out".

This allows to bias the depth of some SUnits in the hope of moving them up or down in the scoreboard and avoid resource conflicts without necessarily increasing the latency of the whole region.

krishnamtibrewala · 2024-10-15T09:03:56Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

+  MachineInstr *ConflictMI = nullptr;
+  for (const MachineBundle &B : SuccBundles) {
+    for (MachineInstr *MI : B.getInstrs()) {
+      if (HR.getHazardType(Scoreboard, MI->getDesc(), HR.getMemoryBanks(MI),


Instead of MI->getDesc(), please use *HR.getSelectedAltDescs().getDesc(&MI)

Oh I didn't realise that part was already merged. Will do

And that would take away the pre condition of no MSPs?

martien-de-jong · 2024-10-15T12:19:44Z

llvm/lib/Target/AIE/AIEInterBlockScheduling.cpp

+    for (int C = FirstBlockedCycle; C <= LastBlockedCycle; ++C) {
+      Scoreboard[C].blockResources();
+    }
+  }


Is this NFC?

I mentioned it in the commit, in the context of how this was used, this is an NFC.

This prepares for future work where Multi-slot pseudos won't necessarily be materialized.

andcarminati

LGTM.

gbossu added 4 commits October 15, 2024 08:46

[AIEX] Remove heuristic in sticky register DAGMUtator

f77f84a

We are now expecting the schedulers to have improved enough to handle most regressions on their own. (Some of these improvements in next commits)

[AIEX] NFC: Clearer debug logs for resource conflicts

ffa3fa3

[AIEX] New convergence strategy for resource conflicts

717215e

This allows to bias the depth of some SUnits in the hope of moving them up or down in the scoreboard and avoid resource conflicts without necessarily increasing the latency of the whole region.

gbossu requested review from abhinay-anubola, abnikant, andcarminati, khallouh, konstantinschwarz, martien-de-jong, SagarMaheshwari99 and stephenneuendorffer as code owners October 15, 2024 08:48

krishnamtibrewala reviewed Oct 15, 2024

View reviewed changes

martien-de-jong reviewed Oct 15, 2024

View reviewed changes

martien-de-jong previously approved these changes Oct 15, 2024

View reviewed changes

[AIEX] NFC: InterblockScheduling: Use AIEAlternateDescriptors

08c874d

This prepares for future work where Multi-slot pseudos won't necessarily be materialized.

gbossu dismissed martien-de-jong’s stale review via 08c874d October 15, 2024 14:01

martien-de-jong approved these changes Oct 15, 2024

View reviewed changes

andcarminati approved these changes Oct 15, 2024

View reviewed changes

gbossu merged commit ab2b5d2 into aie-public Oct 15, 2024
8 checks passed

gbossu deleted the gaetan.loopaware.convergence branch October 15, 2024 18:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIEX] Improve iterative scheduling convergence strategy #212

[AIEX] Improve iterative scheduling convergence strategy #212

gbossu commented Oct 15, 2024 •

edited

Loading

krishnamtibrewala Oct 15, 2024

gbossu Oct 15, 2024

martien-de-jong Oct 15, 2024

martien-de-jong Oct 15, 2024

gbossu Oct 15, 2024

andcarminati left a comment

[AIEX] Improve iterative scheduling convergence strategy #212

[AIEX] Improve iterative scheduling convergence strategy #212

Conversation

gbossu commented Oct 15, 2024 • edited Loading

krishnamtibrewala Oct 15, 2024

Choose a reason for hiding this comment

gbossu Oct 15, 2024

Choose a reason for hiding this comment

martien-de-jong Oct 15, 2024

Choose a reason for hiding this comment

martien-de-jong Oct 15, 2024

Choose a reason for hiding this comment

gbossu Oct 15, 2024

Choose a reason for hiding this comment

andcarminati left a comment

Choose a reason for hiding this comment

gbossu commented Oct 15, 2024 •

edited

Loading