-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RyuJIT: Inlined "is class statically inited" check #47901
Conversation
I wonder if we should consider using this check as a gating condition for loop cloning... Also may factor into the do-while transformation heuristics, these try to account for the potential "savings" from hoisting the class init call check out of the loop. So for some subset of loops we would produce a loop that knows classes are inited and does no checks, and another that will conditionally check within the loop body as above. |
320482e
to
77a4d16
Compare
Can you look at the case where there are two static accesses in a method, one of which dominates the other (say, separated by some unrelated flow, just to make things interesting), and make sure we're only emitting one check and call? Eg var x = s_f1;
if (...) { }
var y = s_f2; // we should not need a check here (when optimizing) My concern is that if we make the call to init the class conditional under the check, then the jit may no longer realize that the dominated check and call are redundant. |
@AndyAyersMS unfortunately it regresses that case, I wonder if I should do this optimization later after loopHoisting. This optimization took me a bit longer to implement, mainly because of COMMA and QMARK but at least I now know more about them and how they are expanded. |
Perhaps... you might also be able to work around this by leaving some sort of pseduo-op at the join point below the initial check, something that value numbers the same as the helper call but doesn't get treated as a call otherwise, and emits no code. If we do that and mark the indirection to fetch the init flag nonfaulting hopefully the jit suppresses the now-dominated call and realizes it has an empty if/then guarded by a side-effect free predicate, and cleans it all up nicely. |
4e5a6f5
to
0e1585d
Compare
Rewrote to be just before the "insert gc safepoints" phase (closer to the rationalization phase) Even if the current approach won't make it I still learned a lot about the existing flowgraph APIs to manipulate with blocks. |
Just checked the jump-threading example: ...
[MethodImpl(MethodImplOptions.NoInlining)]
public static int Test(bool cond1, bool cond2)
{
int z = 0;
if (cond1)
{
z += CctorTestClass.A;
if (cond2)
z += CctorTestClass.B;
}
return z;
}
}
static class CctorTestClass
{
public static readonly int A;
public static readonly int B;
static CctorTestClass()
{
A = 42;
B = 43;
}
} Emits: G_M50630_IG01: ;; offset=0000H
56 push rsi
4883EC20 sub rsp, 32
8BF2 mov esi, edx
;; bbWeight=1 PerfScore 1.50
G_M50630_IG02: ;; offset=0007H
33C0 xor eax, eax
84C9 test cl, cl
741C je SHORT G_M50630_IG05
;; bbWeight=1 PerfScore 1.50
G_M50630_IG03: ;; offset=000DH
0FB6059E1C1000 movzx rax, byte ptr [(reloc)]
A801 test al, 1
7417 je SHORT G_M50630_IG06
;; bbWeight=0.50 PerfScore 1.63
G_M50630_IG04: ;; offset=0018H
8B05961C1000 mov eax, dword ptr [reloc classVar[0x6d6303a0]]
4084F6 test sil, sil
7406 je SHORT G_M50630_IG05
03058F1C1000 add eax, dword ptr [reloc classVar[0x6d6303b8]]
;; bbWeight=0.50 PerfScore 1.63
G_M50630_IG05: ;; offset=0029H
4883C420 add rsp, 32
5E pop rsi
C3 ret
;; bbWeight=1 PerfScore 1.75
G_M50630_IG06: ;; offset=002FH
48B960EF606DF97F0000 mov rcx, 0x7FF96D60EF60
BA02000000 mov edx, 2
E8DDB61A5F call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
EBD3 jmp SHORT G_M50630_IG04
;; bbWeight=0 PerfScore 0.00 with a single check as expected, the cctor block is cold. 0FB6059E1C1000 movzx rax, byte ptr [(reloc)]
A801 test al, 1
7417 je SHORT G_M50630_IG06 to A801 test byte ptr [(reloc)], 1
7417 je SHORT G_M50630_IG06 but it's a different issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this later was a good idea.
Left you some notes.
Jit-diffs: https://gist.github.com/EgorBo/2ba6bcc04c5d68a295f758428499c34a Those are quite huge because of PMI nature. but I'd love to see if it affects the benchmarks. |
5f87af0
to
a3ca77d
Compare
The PR is green. |
From the dynamic pmi gist:
This is a fairly large diff and I wonder if there are things we could do to mitigate to reduce the impact on crossgen I assume a crossgen diff shows similar code size growth? Can you look at this? I'm curious what is going on in some of the methods with very high % diffs. Can you show the old/new codegen for something like:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try and test cases where there are multiple class init calls in a single tree?
Not sure if we ever create these but if so it would be good to verify things work as expected.
As implemented, it won't kick in for crossgen. And if it did kick in for crossgen, the size regression would likely be even bigger due to extra indirections and fixups. Would it make sense to do this optimization for tier-1 JIT only or only inside loops? The static constructors should have run for the most part already, and the rest should not add up to much. |
it's currently tier-1 jit only, will check the diff for "loop-only" ( |
I meant when we we are recompiling code that run in the current process already at least one. It is different from enabling it when optimizations are on. |
um.. the recompiled code is supposed to be initclass-free, isn't it? all classes will init itself on tier0. |
Quick experiment: Current diff for corelib (without the dynamic cases): Diff for methods with From my understanding the non-loop ones are expected to get rid of the static init by their own so we can leave as is. |
The interesting case is when we're optimizing a method without having first run a Tier0 or prejitted version. So that would mean non-Tier1 optimized code (in our current world, AggressiveOptimization, non-prejitted methods with loops, and other things that currently bypass tiering like dynamic methods). But perhaps it's not too much of a stretch to continue to enable this on for all optimized code as we don't expect this to do much for methods where we've already run an earlier version. To restrict this to calls in loops, you should try using the loop info to detect loop membership instead of relying on |
I understand, |
db9e5c5
to
15318fa
Compare
Sure, but it's a question of benefit/cost ratio. If the call is not inside a loop then the benefit (cycles saved per call to the root method) is less but the cost (extra bytes of code) is the same. Or to put it another way, the loop is a benefit multiplier. I still think some of those absolute code size increases in your diff summary seem large; if we can find a way to reduce that then perhaps we can consider enabling this more broadly. So we should verify we're getting the codegen we expect there. |
ab5dc81
to
15318fa
Compare
Ping myself |
Will back to it later, I couldn't find cases where non-hoistable static initializations were inside loops in asm diffs |
Fixes #1327
For members with static constructors or
beforefieldinit
we emit a helper call to initialize the classes. This PR adds an additional quick check "is already inited?" so we can get rid of the call overhead (which is supposed to be executed only once).Example:
Current codegen:
New codegen:
In most cases such init-calls are eliminated in the real world, because a type can be already initialized at the JIT time (e.g. during re-jitting tier0 -> tier1), but it's not the case for methods with loops because those aren't re-jitted atm (only if
COMPlus_TC_QuickJitForLoops=1
is set) and we might end up with slowcall init
forever especially when if it's located inside a loop (in case of static constructors we can't hoist such calls)./cc @dotnet/jit-contrib