-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promote (scalar replace) structs with more than 4 fields #6534
Comments
This is due to a current limitation in struct promotion (aka scalar replacement) that restricts it to structs with no more than 4 fields. |
This is related to #6494, but as this is specific and has a specific test case, I'm going to rename this "Promote (scalar replace) structs with more than 4 fields". |
@CarolEidt, thanks for the explanation! |
@dotnet/jit-contrib |
@AndreyAkinshin What's the impact of this? Is this benchmark represent an issue in a particular workload or product? Or is this more of a general investigation? I think this issue is important but I'm trying to get a handle on how quickly we'd need to look at this. |
It's a general investigation, no real issues. =) I just thought that it would be nice to have such kind of optimizations. |
We've carefully crafted a Fragment: public ref struct InstanceEnumerator
{
// These have to be arrays rather than ReadOnlySpan as the JIT won't unpack/promote 'struct in struct' (or span in struct either)
// Revisit with .NET 6 https://github.com/dotnet/runtime/issues/37924
private readonly Instanced<T>[] instances;
private readonly int[] hookIndices;
// ideally this would be Instanced<T> and drop the need for the ii variable in the MoveNext function
// but again, struct in struct promotion (and also increasing the 'field count'
private T current;
// i and j are combined into a 64bit variable beacuse JIT currently won't unpack/promote structs with > 4 fields into registers
// See https://github.com/dotnet/runtime/issues/6534
private ulong ij;
//private int i;
//private int j;
... |
Physical promotion (#83388) can potentially clean this up. With #85105 and ; Assembly listing for method Program:Run8():int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;* V00 loc0 [V00 ] ( 0, 0 ) struct ( 8) zero-ref do-not-enreg[SF]
; V01 OutArgs [V01 ] ( 1, 1 ) struct (32) [rsp+00H] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
; V02 tmp1 [V02,T02] ( 2, 2 ) ubyte -> rdx single-def "V00.[000..001)"
; V03 tmp2 [V03,T03] ( 2, 2 ) ubyte -> rax single-def "V00.[001..002)"
; V04 tmp3 [V04,T00] ( 3, 6 ) byref -> rax single-def "Spilling address for field-by-field copy"
; V05 tmp4 [V05,T01] ( 3, 6 ) ref -> rax single-def "arr expr"
;
; Lcl frame size = 40
G_M14509_IG01:
sub rsp, 40
;; size=4 bbWeight=1 PerfScore 0.25
G_M14509_IG02:
mov rax, 0xD1FFAB1E ; data for Program:cards8
mov rax, gword ptr [rax]
cmp dword ptr [rax+08H], 0
jbe SHORT G_M14509_IG04
add rax, 16
movzx rdx, byte ptr [rax]
movzx rax, byte ptr [rax+01H]
sub edx, eax
mov eax, edx
;; size=34 bbWeight=1 PerfScore 11.00
G_M14509_IG03:
add rsp, 40
ret
;; size=5 bbWeight=1 PerfScore 1.25
G_M14509_IG04:
call CORINFO_HELP_RNGCHKFAIL
int3
;; size=6 bbWeight=0 PerfScore 0.00
; Total bytes of code 49, prolog size 4, PerfScore 17.40, instruction count 14, allocated bytes for code 49 (MethodHash=6250c752) for method Program:Run8():int
; ============================================================ Currently this promotion will not happen without disabling costing checks because the profitability heuristic does not take into account that promotion can potentially make some struct loads much cheaper (in this case the array access). Hence the heuristic believes that with only one access to each field, it is best left alone. Another thing is that we can avoid spilling simple address computations like these when they are just |
After #86660 we get the above codegen by default when physical promotion is enabled. Sadly the It also remains in [000010] DACXGO----- ▌ STORE_LCL_VAR struct<Cards3, 3>(P) V00 loc0
▌ ubyte V00.Cards3:C0 (offs=0x00) -> V02 tmp1
▌ ubyte V00.Cards3:C1 (offs=0x01) -> V03 tmp2
▌ ubyte V00.Cards3:C2 (offs=0x02) -> V04 tmp3 (last use)
[000009] nACXG+----- └──▌ BLK struct<Cards3, 3>
[000034] -ACXG+----- └──▌ COMMA byref
[000021] DACXG+----- ├──▌ STORE_LCL_VAR ref V05 tmp4
[000006] --CXG+----- │ └──▌ COMMA ref
[000005] H-CXG+----- │ ├──▌ CALL help long CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
[000003] -----+----- arg0 in rcx │ │ ├──▌ CNS_INT long 0x7ff7ef1aff68
[000004] -----+----- arg1 in rdx │ │ └──▌ CNS_INT int 3
[000001] I---G+----- │ └──▌ IND ref
[000000] H----+----- │ └──▌ CNS_INT(h) long 0x1bc92001d08 static Fseq[cards3]
[000033] ---X-+----- └──▌ COMMA byref
[000026] ---X-+----- ├──▌ BOUNDS_CHECK_Rng void
[000007] -----+----- │ ├──▌ CNS_INT int 0
[000025] ---X-+----- │ └──▌ ARR_LENGTH int
[000022] -----+----- │ └──▌ LCL_VAR ref V05 tmp4
[000032] -----+----- └──▌ ARR_ADDR byref Cards3[]
[000031] -----+----- └──▌ ADD byref
[000023] -----+----- ├──▌ LCL_VAR ref V05 tmp4
[000030] -----+----- └──▌ CNS_INT long 16
I've opened #86755 for this. |
Let's look at the following code (based on this StackOverflow question):
Now let's look at the asm code (Windows 10, .NET Framework 4.6.1 (4.0.30319.42000), clrjit-v4.6.1080.0):
As you can see, in the
Run3
case, RyuJIT keeps the target bytes (C0
,C1
) in theedx
,eax
registers; in theRun8
case, RyuJIT keeps them on stack (qword ptr [rsp+20h]
). Why? This may slightly degrade the performance of an application (see these benchmarks).category:cq
theme:structs
skill-level:expert
cost:large
impact:large
The text was updated successfully, but these errors were encountered: