-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Allow moving handler regions in fgRelocateEHRegions #101613
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs look like an improvement: A lot of the size savings seem to come from shorter jumps on non-exceptional paths when we decide to move handlers further down the method, while a lot of the size regressions stem from new jumps now that handlers don't fall through to the non-exceptional successor, as well as larger jumps to the handlers. Since we only move handlers if they're marked as rarely run, I think we want to do this movement as a matter of principle. |
SPMI replay failure is a timeout -- this change shouldn't affect x64, anyway. |
Probably worth running jitstress. Unfortunately, that is blocked by #101628 |
Looks like jitstress is no longer blocked (at least not on x86). I'll run it now |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
@AndyAyersMS this is the only test to fail in the outerloop pipelines: The
and with:
We decided to move the catch block since it's cold, but we kept the try block in the loop. The only difference in codegen is the try block now falls into the bottom loop block, and the catch block has to jump to the bottom loop block, rather than the other way around. This looks like a runtime issue, considering the failure is specific to delegates, right? |
I wonder if this is one of those cases where we have to put a NOP after the call, if it's at the end of the try range, or else we can't figure out if we were in the try or not. When you did your experiment of calling a method directly, did the call end the try or was there another instruction afterwards? |
The try block still ended with a call, which led me to think the codegen is fine. Though let me try disabling the jump-to-next removal optimization, and see if that fixes the failure locally... |
The clr-abi.md and code say this NOP is only needed for x64. I wonder if it is also needed for x86 but since historically for x86 the catch was always just after the try, the try would always end with something other than a call, so we just never noticed? |
I think you're right. Enabling the NOP check for x86 fixes this failure. New codegen:
Would you like me to turn it on in this PR, or a separate one? It might be interesting to see how many cases this fires for. |
I'm surprised we haven't run into the other cases yet, like a call instruction preceding the start of a try region. Since we haven't faced issues for those, should we be conservative in our fix by only emitting the NOP for the "call before EH region end" case for now? |
Good question. Seems like if the try-entering side was a problem we'd have already run into it somehow, since there is nothing preventing us from falling into a try? |
Our guess is that it won't ever fire without this PR. If you verify that I don't see why a separate PR would be interesting. |
When I experimented with placing a call to a normal method that throws at the end of the try region, there wasn't any code after the call instruction, but it didn't fail, so I suspect there might be other methods that didn't fail, but we'll end up inserting NOPs for with the fix. |
I added a check to I think this is the minimally impactful solution we can use to unblock this for now. |
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
@AndyAyersMS emitting a NOP before the end of the try region seemed to fix the failure. Linux ARM64 failure is unrelated. Should we update the ABI documentation to cover this case? |
I would like to get confirmation from the runtime side that NOP padding for x86 EH makes sense. @mangod9 who can we ask about this? |
…ntiguity later (#108914) Part of #107749. `Compiler::fgMoveColdBlocks` currently moves cold try blocks to the end of their innermost regions. This is problematic for our 3-opt layout plans: When identifying a candidate span of blocks to reorder, assuming that all cold blocks are at the end of the method vastly simplifies our implementation. However, if we have EH regions with their own cold sections, `fgMoveColdBlocks` will interleave hot and cold blocks. To facilitate later layout passes, we can simplify `fgMoveColdBlocks` to naively move all cold blocks to the end of the method, regardless of EH region, and rely on a "fixup" pass for making EH regions contiguous again. To start, I've tweaked `fgMoveColdBlocks` to break up try regions only. When handlers are placed in the funclet section, we don't need to do anything extra to get cold EH blocks out of the main method body's hot section. However, for jitted x86 code, we don't use the funclet model (yet), so cold handler blocks can still litter the main method body, hindering 3-opt's candidate space. I'd rather not expand on this PR's logic to rebuild handler regions if we can do something simpler, such as getting #101613 merged in, and using `fgRelocateEHRegions` to move all handlers to the end of the method under the assumption that they're cold (i.e. a pseudo-funclet region). Moving try entry blocks in `fgMoveColdBlocks` proved painful enough that I think we're better off leaving them as-is. Leaving each try region's entry in-place gives us a nice breadcrumb for reinserting the remaining blocks, and it might be beneficial to leave these entries in the candidate span of blocks for 3-opt, so we can effectively move entire try regions just by moving the entry. For try regions that are entirely cold, I can look into calling `fgRelocateEHRegions` before `fgMoveColdBlocks` on all platforms to quickly get these out of the way. All of this would be unnecessary if we could remove the VM's requirement of contiguous EH regions, and the codegen improvements would likely outweigh the additional VM complexity, though that's a conversation for another day.
…ntiguity later (dotnet#108914) Part of dotnet#107749. `Compiler::fgMoveColdBlocks` currently moves cold try blocks to the end of their innermost regions. This is problematic for our 3-opt layout plans: When identifying a candidate span of blocks to reorder, assuming that all cold blocks are at the end of the method vastly simplifies our implementation. However, if we have EH regions with their own cold sections, `fgMoveColdBlocks` will interleave hot and cold blocks. To facilitate later layout passes, we can simplify `fgMoveColdBlocks` to naively move all cold blocks to the end of the method, regardless of EH region, and rely on a "fixup" pass for making EH regions contiguous again. To start, I've tweaked `fgMoveColdBlocks` to break up try regions only. When handlers are placed in the funclet section, we don't need to do anything extra to get cold EH blocks out of the main method body's hot section. However, for jitted x86 code, we don't use the funclet model (yet), so cold handler blocks can still litter the main method body, hindering 3-opt's candidate space. I'd rather not expand on this PR's logic to rebuild handler regions if we can do something simpler, such as getting dotnet#101613 merged in, and using `fgRelocateEHRegions` to move all handlers to the end of the method under the assumption that they're cold (i.e. a pseudo-funclet region). Moving try entry blocks in `fgMoveColdBlocks` proved painful enough that I think we're better off leaving them as-is. Leaving each try region's entry in-place gives us a nice breadcrumb for reinserting the remaining blocks, and it might be beneficial to leave these entries in the candidate span of blocks for 3-opt, so we can effectively move entire try regions just by moving the entry. For try regions that are entirely cold, I can look into calling `fgRelocateEHRegions` before `fgMoveColdBlocks` on all platforms to quickly get these out of the way. All of this would be unnecessary if we could remove the VM's requirement of contiguous EH regions, and the codegen improvements would likely outweigh the additional VM complexity, though that's a conversation for another day.
Superseded by #113330 |
Follow-up to #101611.
fgRelocateEHRegions
previously would not try to move handler regions due to implementation restrictions infgDetermineFirstColdBlock
that have since been lifted. This should only affect behavior on Windows x86 when we aren't using funclets.Also, I noticed we never set
Compiler::fgCanRelocateEHRegions
to false anywhere, so it seems safe to remove this member.