-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not run stack level setter in release except for x86. #42197
Conversation
For x86 it is required for correctness, for other platforms is only does some checks.
assert(comp->compCanEncodePtrArgCntMax()); | ||
#endif | ||
|
||
#if defined(TARGET_X86) | ||
if (maxStackLevel >= sizeof(unsigned)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the actual change, as you can see from the comment it is applicable only to x86 where we push arguments (and use EBP) but it was forgotten to be guarded as such before CoreCLR initial PR.
PTAL @dotnet/jit-contrib |
Looking at the attached dumps, the register allocation is identical and there are no critical edges, and no resolution is done. Did you intend to attach dumps for a different method? |
Yes, sorry, the correct dumps: the issue is in
|
The edge splitting in both versions is equally bad. But because there are more available registers in the "diff" version, it allows for even more of a discrepancy between the most registers in use at an edge, and the least, so there are more resolution moves. Although there is no loop in this code, I think that the general idea behind the proposed solution to #9909, which is described in https://github.com/dotnet/runtime/blob/master/docs/design/coreclr/jit/lsra-detail.md#avoid-splitting-loop-backedges, would be the best way to address this. The idea would be to ensure that the allocation matches at critical edges to avoid splitting. |
Thanks Carol for the analysis, I will try to address the first issue (bigger immediate encoding for bigger offset with rsp frames) and see how many methods are regressing because of the edge splitting issue, maybe good diffs will compensate bad diffs, otherwise I will postpone this PR until after #9909. |
I opened an issue to track that but will temporarily close the PR until I have time to work on it. |
It saves ~0.20% of the Jit time.
This change has no diffs on arm/arm64 because we always save frame pointer there, the change does not affect x86.
x64 changes are both size improvement (because we don't push and pop rbp and it is available for LSRA) and regression, because we often encode a bigger immediate, for example:
so we encode a bigger immediate for each stack access and it gives us bigger code size, but it looks like something that could be optimized similar to
COMPlus_JitConstCSE
. If we see many patterns likereg + common_base + positive_small_const
and it should use the number of accesses when currently it is using the number of stack variables that is a very rude heuristic. I will try to useCOMPlus_JitConstCSE=3
and update the description, cc @briansull .If I measure PerfScore it ignores such changes (it is questionable if immediate size should affect PerfScore or not but it is another discussion) and shows improvements but some regressions (many) as well, a typical regression is when critical edge resolution forces more variables to be on stack when in fact we start with a better condition: we have rbp available for register allocation.
It looks like an issue in regAlloc, @CarolEidt could you please look at the dump and say if my analysis is correct? Could it be fixed with combine free/busy reg allocation PR?