-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: update block weight for uncond to cond flow opt #98324
JIT: update block weight for uncond to cond flow opt #98324
Conversation
This optimization duplicates code and flow in a BBJ_COND successor into one of its preds; as a result the weight of the successor should decrease. Fixes some issues seen with odd perf scores in the ML/CSE experiment. Contributes to dotnet#93020
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThis optimization duplicates code and flow in a BBJ_COND successor into one of its preds; as a result the weight of the successor should decrease. Fixes some issues seen with odd perf scores in the ML/CSE experiment. Contributes to #93020
|
FYI @dotnet/jit-contrib A few large local regressions. The ones I looked at were all additional cloning in loops with type tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix! I anticipate those additional fgGetPredForBlock
calls will go away soon, once I replace bbTrueTarget
and bbFalseTarget
with flow edges.
src/coreclr/jit/fgopt.cpp
Outdated
@@ -2471,24 +2471,39 @@ bool Compiler::fgOptimizeUncondBranchToSimpleCond(BasicBlock* block, BasicBlock* | |||
// add an unconditional block after this block to jump to the target block's fallthrough block | |||
// | |||
assert(!target->IsLast()); | |||
BasicBlock* next = fgNewBBafter(BBJ_ALWAYS, block, true, target->GetFalseTarget()); | |||
BasicBlock* const next = fgNewBBafter(BBJ_ALWAYS, block, true, target->GetFalseTarget()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixup block should also go away soon, since we no longer need to maintain fallthrough into the false target. I have a change locally for removing this and a bunch of other fixups, but the diffs were discouraging due to all the profile changes by the time we got to block reordering. Hopefully the profile maintenance work we're doing will lessen that impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had removed the fixup block originally, but put it back to reduce diffs.
Diff results for #98324Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,520,572 contexts (999,218 MinOpts, 1,521,354 FullOpts). MISSED contexts: base: 4 (0.00%), diff: 97 (0.00%) Overall (+366,484 bytes)
FullOpts (+366,484 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,542,702 contexts (985,624 MinOpts, 1,557,078 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 95 (0.00%) Overall (+420,577 bytes)
FullOpts (+420,577 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 2,262,128 contexts (921,087 MinOpts, 1,341,041 FullOpts). MISSED contexts: base: 3 (0.00%), diff: 77 (0.00%) Overall (+202,992 bytes)
FullOpts (+202,992 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,368,064 contexts (937,277 MinOpts, 1,430,787 FullOpts). MISSED contexts: base: 0 (0.00%), diff: 84 (0.00%) Overall (+305,988 bytes)
FullOpts (+305,988 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,908,360 contexts (1,240,334 MinOpts, 1,668,026 FullOpts). MISSED contexts: base: 133 (0.00%), diff: 223 (0.01%) Overall (+239,961 bytes)
FullOpts (+239,961 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-0.01% to +0.18%)
MinOpts (-0.00% to +0.01%)
FullOpts (-0.01% to +0.24%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-0.02% to +0.21%)
FullOpts (-0.02% to +0.26%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-0.01% to +0.14%)
FullOpts (-0.01% to +0.21%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-0.00% to +0.19%)
MinOpts (-0.01% to +0.00%)
FullOpts (-0.00% to +0.27%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-0.02% to +0.15%)
FullOpts (-0.02% to +0.21%)
Details here Throughput diffs for linux/arm ran on windows/x86Overall (-0.01% to +0.06%)
FullOpts (-0.01% to +0.08%)
Throughput diffs for windows/x86 ran on windows/x86Overall (-0.02% to +0.01%)
FullOpts (-0.02% to +0.02%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (-0.01% to +0.18%)
FullOpts (-0.01% to +0.24%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.02% to +0.20%)
FullOpts (-0.02% to +0.26%)
Details here |
Diff results for #98324Assembly diffsAssembly diffs for linux/arm ran on windows/x86Diffs are based on 2,257,223 contexts (832,052 MinOpts, 1,425,171 FullOpts). MISSED contexts: base: 73,583 (3.16%), diff: 73,599 (3.16%) Overall (+122,712 bytes)
FullOpts (+122,712 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,678,702 contexts (1,054,747 MinOpts, 1,623,955 FullOpts). MISSED contexts: base: 11 (0.00%), diff: 656 (0.02%) Overall (-51,776 bytes)
FullOpts (-51,776 bytes)
Details here |
Hmm, rather bigger diffs than I was expecting. I will need to dig in and see if this is all attributable to more cloning, and whether it is time to at least build some kind of vague heuristic. |
Looks like regressions are indeed from more cloning. |
In particular type test cloning is driven by the likelihood of the type test succeeding, and with this profile update we now see more tests that appear successful. |
@amanasifkhalid can you take another look? I removed the Next block and just wire up the flow directly. TP diffs good, PerfScore diffs good. Code size increases, but mainly from libraries tests. Code size impact is all from more or fewer clones, all the ones I saw were from the "clone for type test" heuristic which relies on profile data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for getting rid of some of the "no fallthrough" cruft.
Failure is a timeout spmi replay for linux arm32. |
Improvements on arm64: |
This optimization duplicates code and flow in a BBJ_COND successor into one of its preds; as a result the weight of the successor should decrease.
Fixes some issues seen with odd perf scores in the ML/CSE experiment.
Contributes to #93020
Diffs