-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jit: Sub-optimal code when function is inlined #32415
Comments
Likely a similar issue as seen in #32414, but will take a deeper look when I have a chance. |
Need to investigate further so will mark as 5.0 for now. |
Issue here is that we don't promote structs of structs, so the We won't be able to address this in 5.0. I don't see a pre-existing issue for general struct promotion. @erozenfeld do you know of one? |
Codegen on main: ; Method Program:SumFirst4Caller(Program+Wrapper):int
G_M57344_IG01:
sub rsp, 24
vzeroupper
;; size=7 bbWeight=1 PerfScore 1.25
G_M57344_IG02:
vmovdqu xmm0, xmmword ptr [rcx]
vmovdqu xmmword ptr [rsp+08H], xmm0
;; size=10 bbWeight=1 PerfScore 5.00
G_M57344_IG03:
mov rax, bword ptr [rsp+08H]
mov rdx, rax
movzx rdx, byte ptr [rdx]
mov rcx, rax
movzx rcx, byte ptr [rcx+01H]
add edx, ecx
mov rcx, rax
movzx rcx, byte ptr [rcx+02H]
add edx, ecx
movzx rax, byte ptr [rax+03H]
add eax, edx
;; size=35 bbWeight=1 PerfScore 10.50
G_M57344_IG04:
add rsp, 24
ret
;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 57
Looks much better but still have the unnecessary copy. Generalized promotion would probably be able to get rid of this. |
With physical promotion we can potentially get the following codegen for both methods: ; Assembly listing for method Program:SumFirst4(Program+Wrapper):int
; Emitting BLENDED_CODE for X64 with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 8 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 6 ) byref -> rcx ld-addr-op single-def
;# V01 OutArgs [V01 ] ( 1, 1 ) struct ( 0) [rsp+00H] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
; V02 tmp1 [V02,T02] ( 2, 4 ) int -> rdx "impAppendStmt"
; V03 tmp2 [V03,T03] ( 2, 4 ) int -> rdx "impAppendStmt"
; V04 tmp3 [V04,T04] ( 2, 4 ) int -> rdx "impAppendStmt"
;* V05 tmp4 [V05 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V06 tmp5 [V06 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V07 tmp6 [V07 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V08 tmp7 [V08 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V09 tmp8 [V09 ] ( 0, 0 ) byref -> zero-ref single-def V05._reference(offs=0x00) P-INDEP "field V05._reference (fldOffset=0x0)"
;* V10 tmp9 [V10 ] ( 0, 0 ) int -> zero-ref V05._length(offs=0x08) P-INDEP "field V05._length (fldOffset=0x8)"
;* V11 tmp10 [V11 ] ( 0, 0 ) byref -> zero-ref single-def V06._reference(offs=0x00) P-INDEP "field V06._reference (fldOffset=0x0)"
;* V12 tmp11 [V12 ] ( 0, 0 ) int -> zero-ref V06._length(offs=0x08) P-INDEP "field V06._length (fldOffset=0x8)"
;* V13 tmp12 [V13 ] ( 0, 0 ) byref -> zero-ref single-def V07._reference(offs=0x00) P-INDEP "field V07._reference (fldOffset=0x0)"
;* V14 tmp13 [V14 ] ( 0, 0 ) int -> zero-ref V07._length(offs=0x08) P-INDEP "field V07._length (fldOffset=0x8)"
;* V15 tmp14 [V15 ] ( 0, 0 ) byref -> zero-ref single-def V08._reference(offs=0x00) P-INDEP "field V08._reference (fldOffset=0x0)"
;* V16 tmp15 [V16 ] ( 0, 0 ) int -> zero-ref V08._length(offs=0x08) P-INDEP "field V08._length (fldOffset=0x8)"
; V17 tmp16 [V17,T01] ( 5, 5 ) byref -> rax single-def "V00.[000..008)"
;* V18 tmp17 [V18,T05] ( 0, 0 ) int -> zero-ref "V00.[008..012)"
;
; Lcl frame size = 0
G_M29397_IG01: ;; offset=0000H
;; size=0 bbWeight=1 PerfScore 0.00
G_M29397_IG02: ;; offset=0000H
mov rax, bword ptr [rcx]
movzx rdx, byte ptr [rax]
movzx rcx, byte ptr [rax+01H]
add edx, ecx
movzx rcx, byte ptr [rax+02H]
add edx, ecx
movzx rax, byte ptr [rax+03H]
add eax, edx
;; size=24 bbWeight=1 PerfScore 10.75
G_M29397_IG03: ;; offset=0018H
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 25, prolog size 0, PerfScore 14.25, instruction count 9, allocated bytes for code 25 (MethodHash=92b18d2a) for method Program:SumFirst4(Program+Wrapper):int It requires two changes:
I don't think these tasks should be difficult so I will grab this one @AndyAyersMS. |
As I created this example code I realized that it happens in similar circumstances as #32414, but I don't know if they have the same cause or not.
Given the following code:
SharpLab link
SumFirst4
produces this. There's some weird stuff going on with register allocation, but nothing too bad.SumFirst4Caller
produces this, copying the span to the stack and re-dereferencing the pointer for every element accessed.For reference, adding this extension and swapping
Wrapper
forReadOnlySpan<byte>
gives the following code for bothSumFirst4
andSumFirst4Caller
.category:cq
theme:structs
skill-level:expert
cost:large
impact:large
The text was updated successfully, but these errors were encountered: