-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: fold trees after inline return expression updates #1751
Conversation
Aggressively fold as we substitute inline return value trees in for the return value placeholders. Notice when this folding leads to branch simplification, and make the associated flow graph update. The recently added early flow opt pass will then transitively remove any newly unreachable code. Resolves dotnet/coreclr#27395.
This is the change I had in mind when I started working on #1309; I didn't realize at the time that #1309 would end up being quite a bit more impactful. One would hope that early folding like this would always lead to better code later on, but that's not the case. I will spend some time looking at the more prominent regressions.
Sample good diff: ;;; MemoryExtensions:IndexOfAny(Span`1,long,long,long):int
;;; before
G_M64714_IG01:
sub rsp, 40
mov qword ptr [rsp+38H], rdx
mov qword ptr [rsp+40H], r8
mov qword ptr [rsp+48H], r9
;; bbWeight=1 PerfScore 3.25
G_M64714_IG02:
mov rdx, bword ptr [rcx]
mov ecx, dword ptr [rcx+8]
;; bbWeight=1 PerfScore 4.00
G_M64714_IG03:
mov r8, qword ptr [rsp+38H]
mov r9, qword ptr [rsp+40H]
mov rax, qword ptr [rsp+48H]
mov dword ptr [rsp+20H], ecx
mov rcx, rdx
mov rdx, r8
mov r8, r9
mov r9, rax
call SpanHelpers:IndexOfAny(byref,long,long,long,int):int
nop
;; bbWeight=1 PerfScore 6.25
G_M64714_IG04:
add rsp, 40
ret
;;; after
G_M64714_IG01:
sub rsp, 40
;; bbWeight=1 PerfScore 0.25
G_M64714_IG02:
mov rax, bword ptr [rcx]
mov ecx, dword ptr [rcx+8]
mov dword ptr [rsp+20H], ecx
mov rcx, rax
call SpanHelpers:IndexOfAny(byref,long,long,long,int):int
nop
;; bbWeight=1 PerfScore 6.50
G_M64714_IG03:
add rsp, 40
ret cc @dotnet/jit-contrib |
Is dotnet/coreclr#27395 really the correct issue this solves? |
Thanks for spotting this -- should have been dotnet/coreclr#27935 |
Diff on the test case from dotnet/coreclr#27935: -; Lcl frame size = 56
+; Lcl frame size = 40
G_M21895_IG01:
- sub rsp, 56
- vzeroupper
- xor rax, rax
- mov qword ptr [rsp+28H], rax
- ;; bbWeight=1 PerfScore 2.50
+ sub rsp, 40
+ ;; bbWeight=1 PerfScore 0.25
G_M21895_IG02:
- vmovdqu xmm0, xmmword ptr [rdx]
- vmovdqu xmmword ptr [rsp+28H], xmm0
- ;; bbWeight=1 PerfScore 3.00
-G_M21895_IG03:
- lea rdx, bword ptr [rsp+28H]
call C:FuncAvx(int,System.Span`1[Byte])
nop
- ;; bbWeight=1 PerfScore 1.75
-G_M21895_IG04:
- add rsp, 56
+ ;; bbWeight=1 PerfScore 1.25
+G_M21895_IG03:
+ add rsp, 40
ret
;; bbWeight=1 PerfScore 1.25
-; Total bytes of code 40, prolog size 14, PerfScore 12.70, (MethodHash=7a51aa79) for method C:A(int,System.Span`1[Byte])
+; Total bytes of code 15, prolog size 4, PerfScore 4.25, (MethodHash=7a51aa79) for method C:A(int,System.Span`1[Byte]) |
Note this doesn't get the improvement in |
Seems like if an implicit byref field appears as an argument in a call we ought to give more weight to promotion, since (if unpromoted) the byref will likely be in a register that conflicts with the registers needed to make the call. |
Do you mean struct promotion or CSE promotion? You probably mean this kind of "promoted"
|
Struct promotion. The early trimming reduces the RCS_EARLY ref counts, and so leads to more cases where we we will undo promotion of implicit byref params. This undone promotion effectively "sinks" loads of the byref fields down to where they are consumed and keeps the byref live at those points, and this can lead to potential conflicts. |
I don't see any easy way to address the regressions, and on balance this is a net win, so I suggest we go ahead with it as is.... |
@dotnet/jit-contrib anyone up for reviewing this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, my only question is whether the folding of the JTRUE
is something that is done elsewhere and could be factored out.
There is conceptually similar code in |
Aggressively fold as we substitute inline return value trees in for the return
value placeholders. Notice when this folding leads to branch simplification,
and make the associated flow graph update.
The recently added early flow opt pass will then transitively remove any
newly unreachable code.
Resolves
dotnet/coreclr#27395.dotnet/coreclr#27935 (now migrated to #13824)