-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Dead code at runtime not completely removed #13824
Comments
This particular issue is both specific to this coding pattern and also somewhat general. The general issue is that the jit is fairly slow at propagating information from callees to callers, so wrapping things like The specific issue is that because of this slowness, a copy optimization in morph does not fire when jitting Morph actually folds the branch before it processes either call, but doesn't have the ability to determine that because of this fold, one of the calls is now unreachable, and because of that, that the early reference counts should be adjusted, and because of that, the copy optimization can safely fire on the surviving call. Instead it thinks both calls are live, and processes them both, introducing copies of the local We might be able to rely on block ref counts to eliminate unreachable blocks after inlining, and add logic to the postorder post-inline callback (currently Probably worth prototyping, though early block ref counts might not be accurate enough to do this safely. A related issue is that we end up being handicapped early in the jit by lack of full flow graph connectivity; I've been burned by this in other places too (eg dotnet/coreclr#27113). cc @dotnet/jit-contrib |
Results from a simple, crude prototype which gets the same codegen for
Interestingly the biggest percentage wins come from removing useless pinvoke frame setup. See for example, ;; current jit
G_M40889_IG01:
push rbp
push r15
push r14
push r13
push r12
push rdi
push rsi
push rbx
sub rsp, 88
lea rbp, [rsp+90H]
;; bbWeight=1 PerfScore 8.75
G_M40889_IG02:
lea rcx, [rbp-88H]
mov rdx, r10
call CORINFO_HELP_INIT_PINVOKE_FRAME
mov qword ptr [rbp-48H], rax
mov rdx, rsp
mov qword ptr [rbp-68H], rdx
mov rdx, rbp
mov qword ptr [rbp-58H], rdx
;; bbWeight=1 PerfScore 5.25
G_M40889_IG03:
mov eax, 1
mov rdx, qword ptr [rbp-48H]
mov byte ptr [rdx+12], 1
;; bbWeight=1 PerfScore 2.25
G_M40889_IG04:
lea rsp, [rbp-38H]
pop rbx
pop rsi
pop rdi
pop r12
pop r13
pop r14
pop r15
pop rbp
ret
;; bbWeight=1 PerfScore 5.50
;; prototype
G_M40888_IG01:
;; bbWeight=1 PerfScore 0.00
G_M40888_IG02:
mov eax, 1
;; bbWeight=1 PerfScore 0.25
G_M40888_IG03:
ret
;; bbWeight=1 PerfScore 1.00 The prototype can only very crudely remove unreachable blocks, and it fails on one of the jit-diff assemblies, so the logic may still be amiss somewhere. A more functional version would have some more robust way to clean things up -- likely flagging methods where we suspect unreachable code during inlining, and then running some cleanup post pass on those flagged methods. |
Also note A second simple, crude prototype produces this code for E95B4AFEFF jmp C:FuncAvx(int,System.Span`1[Byte]) And once we support caller-byref params for fast tail calls, we can broaden out the copy opt elimination further since we know all tail-call uses must be last uses. For future reference, some other things to play with in that copy opt clause:
This prototype sees a fair number of hits....
|
The intersection of copy avoidance for last-use implicit by-ref structs at calls and tail calls is interesting. I'm not sure I completely understand it yet. Imagine we have a chain of methods that tail call one another, A->B->C, and B and C have a by-ref struct params. A passes a local struct (by-ref) to B, and B passes it directly to C. A cannot safely tail call B, because A's frame is trashed on entry to B. But A can avoid copying the struct at the call site if it is last use. One might think B can safely tail call C only if it avoids making a local copy. But dotnet/coreclr#5394 shows this isn't always true; a chain of slow tail calls via helpers can "reach back" to earlier frames and trash them as well. The fix in dotnet/coreclr#5394 is to have B copy the struct if it is going to slow tail call C. But this would seemingly make it unsafe for B to tail call C, unless the slow helper does something magical. It's possible the proposed new portable fast tail call mechanism (#341) will have similar reach-back abilities. Need to look into this more closely. |
runtime/src/coreclr/src/vm/i386/stublinkerx86.cpp Line 5672 in 6a620e6
This optimization should be ok for the new tail call helpers. From the JIT's perspective the helpers will not do anything special with the arguments. They will simply be copied into TLS using standard IL instructions. |
Fixed by #1751. |
The JIT doesn't properly remove dead code in certain instances. Here's an example that does some method dispatching based on CPU features:
A1
Moves some arguments around on the stack unnecessarily.Manually inlining
IsAvxSupported
or removing the code from thefalse
branch like inA2
andA3
make the JIT remove all those instructions, leaving just the call.category:cq
theme:optimization
skill-level:expert
cost:large
The text was updated successfully, but these errors were encountered: