-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inlined struct copies via params, returns and assignment not elided #12219
Comments
You can |
For something like |
Not using the same variable and passing directly produces same output [MethodImpl(MethodImplOptions.NoInlining)]
static long InlinedAssignment()
{
var s = GetLargeStruct(CreateLargeStruct());
return s.l3;
} ; Assembly listing for method Program:InlinedAssignment():long
;
; V00 OutArgs [V00 ] ( 1, 1 ) lclBlk (32) [rsp+0x00] "OutgoingArgSpace"
; V01 tmp1 [V01 ] ( 2, 4 ) struct (32) [rsp+0x48] do-not-enreg[XSB] addr-exposed "struct address for call/obj"
;* V02 tmp2 [V02 ] ( 0, 0 ) struct (32) zero-ref "struct address for call/obj"
; V03 tmp3 [V03 ] ( 2, 4 ) struct (32) [rsp+0x28] do-not-enreg[XSB] addr-exposed "Inlining Arg"
;* V04 tmp4 [V04 ] ( 0, 0 ) long -> zero-ref V02.l0(offs=0x00) P-INDEP "field V02.l0 (fldOffset=0x0)"
;* V05 tmp5 [V05 ] ( 0, 0 ) long -> zero-ref V02.l1(offs=0x08) P-INDEP "field V02.l1 (fldOffset=0x8)"
;* V06 tmp6 [V06 ] ( 0, 0 ) long -> zero-ref V02.l2(offs=0x10) P-INDEP "field V02.l2 (fldOffset=0x10)"
; V07 tmp7 [V07,T01] ( 2, 2 ) long -> rax V02.l3(offs=0x18) P-INDEP "field V02.l3 (fldOffset=0x18)"
; V08 tmp8 [V08,T00] ( 3, 6 ) byref -> rax "BlockOp address local"
;
; Lcl frame size = 104
G_M60975_IG01:
4883EC68 sub rsp, 104
C5F877 vzeroupper
G_M60975_IG02:
488D4C2448 lea rcx, bword ptr [rsp+48H]
E8BF67FFFF call Program:CreateLargeStruct():struct
C5FA6F442448 vmovdqu xmm0, qword ptr [rsp+48H]
C5FA7F442428 vmovdqu qword ptr [rsp+28H], xmm0
C5FA6F442458 vmovdqu xmm0, qword ptr [rsp+58H]
C5FA7F442438 vmovdqu qword ptr [rsp+38H], xmm0
488D442428 lea rax, bword ptr [rsp+28H]
488B10 mov rdx, qword ptr [rax]
488B4018 mov rax, qword ptr [rax+24]
G_M60975_IG03:
4883C468 add rsp, 104
C3 ret
; Total bytes of code 58, prolog size 7 for method Program:InlinedAssignment():long Though the |
There are some correctness concerns related to exception semantics and self-modification that require staging some struct operations through temps. The copies (and temps) can sometimes be optimized away later after global (method-wide) analysis. The jit does not do this analysis yet, but hopefully will soon -- see the plans for first-class structs. cc @CarolEidt |
That code is practically the same as the original code. There is still a temporary variable, even if you don't declare one in the C# code. The address of that temp variable is passed to Then the ABI semantics require a copy of that temp variable to be passed to And it looks like the copy of temp variable is too address exposed. Not sure why, looks like bad handling of promoted struct in I think the big question here is if the a local variable that's used as return buffer actually needs to be address exposed. |
Here's an IR dump:
LocalAddressVisitor marks V01 address exposed because its address is passed to |
Yeah, I have a change that prevents V03 from becoming address exposed but that has little impact: 488D4C2448 lea rcx, bword ptr [rsp+48H]
E83F68FFFF call Program:CreateLargeStruct():struct
C5FA6F442448 vmovdqu xmm0, qword ptr [rsp+48H]
C5FA7F442428 vmovdqu qword ptr [rsp+28H], xmm0
C5FA6F442458 vmovdqu xmm0, qword ptr [rsp+58H]
C5FA7F442438 vmovdqu qword ptr [rsp+38H], xmm0
488B442440 mov rax, qword ptr [rsp+40H] Basically it eliminates the useless |
I recall thinking local assertion prop in morph could eliminate some of these struct copy chains (which is what lead me to that recent fix for |
AFAIK assertionprop does not touch address exposed lclvars. For example: https://github.com/dotnet/coreclr/blob/2a77dc515bccf78308232384522017e5a3bdc3f0/src/jit/assertionprop.cpp#L1001-L1005 |
I was looking at a similar example when I ran across #12086. using System;
struct S
{
public string s;
public int a;
public int b;
public int c;
public int d;
public S(string x)
{
s = x;
a = 0;
b = 0;
c = 0;
d = 0;
}
}
class C
{
static S F(S s) => s;
static void P(S s) => Console.WriteLine(s.s);
public static void Main()
{
P(F(new S("hello, world!")));
}
} We end up pre-morph with a chain of 3 struct copies:
and local prop knows they are all equivalent:
but for some reason does not rewrite the last assignment, so we end up with
Nothing here is address exposed but some structs are address taken. The first copy is now dead and gets removed, but not the second one, for some reason. We should be able to swap And we should be able to swap I opened #12098 as part of this, but never really got to the bottom of things. |
Ah, it is coming back to me now -- by the time |
Have a prototype that shows some promise on my example, but not Ben's (yet): GeneralizeIndAssertionProp.
This completely flattens my test case; all struct ops are removed and we just print the string: ; Assembly listing for method C:Main()
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
; V00 OutArgs [V00 ] ( 1, 1 ) lclBlk (32) [rsp+0x00] "OutgoingArgSpace"
;* V01 tmp1 [V01,T00] ( 0, 0 ) struct (24) zero-ref do-not-enreg[SFB] "NewObj constructor temp"
;* V02 tmp2 [V02 ] ( 0, 0 ) struct (24) zero-ref do-not-enreg[SB] "struct address for call/obj"
;* V03 tmp3 [V03 ] ( 0, 0 ) struct (24) zero-ref do-not-enreg[SB] "Inlining Arg"
;* V04 tmp4 [V04 ] ( 0, 0 ) struct (24) zero-ref do-not-enreg[SB] "Inlining Arg"
; V05 cse0 [V05,T01] ( 2, 2 ) ref -> rcx "ValNumCSE"
;
; Lcl frame size = 40
G_M31058_IG01:
sub rsp, 40
nop
G_M31058_IG02:
mov rcx, 0xD1FFAB1E
mov rcx, gword ptr [rcx]
call Console:WriteLine(ref)
nop
G_M31058_IG03:
add rsp, 40
ret more broadly it seems to have modest impact:
Perhaps something like this in conjunction with less liberal use of address exposure might be interesting. Also looks like |
Current code generated for windows/x64: ; InlinedAssignment():long
G_M927_IG01:
sub rsp, 104
vzeroupper
;; bbWeight=1 PerfScore 1.25
G_M927_IG02:
lea rcx, [rsp+48H]
call MiniBench.Issue_12219:CreateLargeStruct():MiniBench.LargeStruct
vmovdqu xmm0, xmmword ptr [rsp+48H]
vmovdqu xmmword ptr [rsp+28H], xmm0
vmovdqu xmm0, xmmword ptr [rsp+58H]
vmovdqu xmmword ptr [rsp+38H], xmm0
mov rax, qword ptr [rsp+40H]
;; bbWeight=1 PerfScore 10.50
G_M927_IG03:
add rsp, 104
ret
;; bbWeight=1 PerfScore 1.25 ; InlinedCtor():long
G_M47270_IG01:
sub rsp, 136
vzeroupper
;; bbWeight=1 PerfScore 1.25
G_M47270_IG02:
lea rcx, [rsp+48H]
call MiniBench.Issue_12219:CreateLargeStruct():MiniBench.LargeStruct
vmovdqu xmm0, xmmword ptr [rsp+48H]
vmovdqu xmmword ptr [rsp+28H], xmm0
vmovdqu xmm0, xmmword ptr [rsp+58H]
vmovdqu xmmword ptr [rsp+38H], xmm0
vmovdqu xmm0, xmmword ptr [rsp+28H]
vmovdqu xmmword ptr [rsp+68H], xmm0
vmovdqu xmm0, xmmword ptr [rsp+38H]
vmovdqu xmmword ptr [rsp+78H], xmm0
mov rax, qword ptr [rsp+80H]
;; bbWeight=1 PerfScore 18.50
G_M47270_IG03:
add rsp, 136
ret
Current code generated for windows/arm64: ; InlinedAssignment():long
G_M927_IG01:
stp fp, lr, [sp,#-80]!
mov fp, sp
;; bbWeight=1 PerfScore 1.50
G_M927_IG02:
add x8, fp, #48 // [V01 tmp1]
bl MiniBench.Issue_12219:CreateLargeStruct():MiniBench.LargeStruct
ldp x0, x1, [fp,#48] // [V01 tmp1]
stp x0, x1, [fp,#16] // [V03 tmp3]
ldp x0, x1, [fp,#64] // [V01 tmp1+0x10]
stp x0, x1, [fp,#32] // [V03 tmp3+0x10]
ldr x0, [fp,#40] // [V03 tmp3+0x18]
;; bbWeight=1 PerfScore 11.50
G_M927_IG03:
ldp fp, lr, [sp],#80
ret lr
; InlinedCtor():long
G_M47270_IG01:
stp fp, lr, [sp,#-112]!
mov fp, sp
;; bbWeight=1 PerfScore 1.50
G_M47270_IG02:
add x8, fp, #48 // [V02 tmp2]
bl MiniBench.Issue_12219:CreateLargeStruct():MiniBench.LargeStruct
ldp x0, x1, [fp,#48] // [V02 tmp2]
stp x0, x1, [fp,#16] // [V03 tmp3]
ldp x0, x1, [fp,#64] // [V02 tmp2+0x10]
stp x0, x1, [fp,#32] // [V03 tmp3+0x10]
ldp x0, x1, [fp,#16] // [V03 tmp3]
stp x0, x1, [fp,#80] // [V01 tmp1]
ldp x0, x1, [fp,#32] // [V03 tmp3+0x10]
stp x0, x1, [fp,#96] // [V01 tmp1+0x10]
ldr x0, [fp,#104] // [V01 tmp1+0x18]
;; bbWeight=1 PerfScore 19.50
G_M47270_IG03:
ldp fp, lr, [sp],#112
ret lr
Adding the |
Could you clarify if this is on Windows or Unix? The ABIs, particularly around SIMD types here isn't the same and so there are scenarios where |
This is on windows. |
Here is a repro: [MethodImpl(MethodImplOptions.NoInlining)]
public static bool Compare(SqlString s1, SqlString s2)
{
return MyCompare(s1, s2);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool MyCompare(SqlString s1, SqlString s2)
{
return MyCompare2(s1, s2);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool MyCompare2(SqlString s1, SqlString s2)
{
return s1.IsNull == s2.IsNull;
} ; Assembly listing for method C:Compare(System.Data.SqlTypes.SqlString,System.Data.SqlTypes.SqlString):bool
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 4 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 3, 6 ) byref -> rcx single-def
; V01 arg1 [V01,T01] ( 3, 6 ) byref -> rdx single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
; V03 tmp1 [V03,T02] ( 2, 4 ) struct (32) [rsp+68H] do-not-enreg[S] single-def "Inlining Arg"
; V04 tmp2 [V04,T03] ( 2, 4 ) struct (32) [rsp+48H] do-not-enreg[S] single-def "Inlining Arg"
; V05 tmp3 [V05,T04] ( 2, 4 ) struct (32) [rsp+28H] do-not-enreg[SF] ld-addr-op single-def "Inlining Arg"
; V06 tmp4 [V06,T05] ( 2, 4 ) struct (32) [rsp+08H] do-not-enreg[SF] ld-addr-op single-def "Inlining Arg"
; V07 tmp5 [V07,T06] ( 2, 4 ) int -> rax "impAppendStmt"
;
; Lcl frame size = 136
G_M49149_IG01:
sub rsp, 136
vzeroupper
;; bbWeight=1 PerfScore 1.25
G_M49149_IG02:
vmovdqu ymm0, ymmword ptr[rcx]
vmovdqu ymmword ptr[rsp+68H], ymm0
;; bbWeight=1 PerfScore 6.00
G_M49149_IG03:
vmovdqu ymm0, ymmword ptr[rdx]
vmovdqu ymmword ptr[rsp+48H], ymm0
;; bbWeight=1 PerfScore 6.00
G_M49149_IG04:
vmovdqu ymm0, ymmword ptr[rsp+68H]
vmovdqu ymmword ptr[rsp+28H], ymm0
;; bbWeight=1 PerfScore 5.00
G_M49149_IG05:
vmovdqu ymm0, ymmword ptr[rsp+48H]
vmovdqu ymmword ptr[rsp+08H], ymm0
;; bbWeight=1 PerfScore 5.00
G_M49149_IG06:
cmp byte ptr [rsp+40H], 0
sete al
movzx rax, al
cmp byte ptr [rsp+20H], 0
sete dl
movzx rdx, dl
cmp eax, edx
sete al
movzx rax, al
;; bbWeight=1 PerfScore 8.00
G_M49149_IG07:
vzeroupper
add rsp, 136
ret |
I believe this was fixed by #64130. Codegen on main is: ; Method Program:InlinedAssignment():long
G_M53179_IG01:
sub rsp, 72
;; size=4 bbWeight=1 PerfScore 0.25
G_M53179_IG02:
lea rcx, [rsp+28H]
call [Program:CreateLargeStruct():LargeStruct]
mov rax, qword ptr [rsp+40H]
;; size=16 bbWeight=1 PerfScore 4.50
G_M53179_IG03:
add rsp, 72
ret
;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 25 ; Method Program:InlinedCtor():long
G_M38338_IG01:
sub rsp, 72
;; size=4 bbWeight=1 PerfScore 0.25
G_M38338_IG02:
lea rcx, [rsp+28H]
call [Program:CreateLargeStruct():LargeStruct]
mov rax, qword ptr [rsp+40H]
;; size=16 bbWeight=1 PerfScore 4.50
G_M38338_IG03:
add rsp, 72
ret
;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 25 |
As seen in
ValueTaskAwaiter
from dotnet/coreclr#22735, dotnet/coreclr#22738 wasn't a specific issue for it with simple repo so opening this.Produces
and
Some of these copies could be skipped?
/cc @AndyAyersMS @mikedn @stephentoub @jkotas
category:cq
theme:structs
skill-level:expert
cost:large
The text was updated successfully, but these errors were encountered: