Promote (scalar replace) structs with more than 4 fields #6534

AndreyAkinshin · 2016-08-22T15:25:59Z

Let's look at the following code (based on this StackOverflow question):

public struct Cards3
{
    public byte C0, C1, C2;
}

public struct Cards8
{
    public byte C0, C1, C2, C3, C4, C5, C6, C7;
}

class Program
{
    static void Main()
    {           
        Run3();
        Run8();
    }

    private static Cards3[] cards3 = new Cards3[1];
    private static Cards8[] cards8 = new Cards8[1];

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static int Run3()
    {
        var c = cards3[0];
        return c.C0 - c.C1;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static int Run8()
    {
        var c = cards8[0];
        return c.C0 - c.C1;
    }
}

Now let's look at the asm code (Windows 10, .NET Framework 4.6.1 (4.0.30319.42000), clrjit-v4.6.1080.0):

; Run3                                                    
            var c = cards3[0];                            
00007FFEDF0A4752  in          al,dx                       
00007FFEDF0A4753  sub         byte ptr [rax-48h],cl       
00007FFEDF0A4756  add         byte ptr [rax],0B0h         
00007FFEDF0A4759  out         72h,al                      
00007FFEDF0A475B  add         al,byte ptr [rax]           
00007FFEDF0A475D  add         byte ptr [rax-75h],cl       
00007FFEDF0A4760  add         byte ptr [rbx+76000878h],al 
00007FFEDF0A4766  adc         al,48h                      
00007FFEDF0A4768  add         eax,10h                     
00007FFEDF0A476B  movzx       edx,byte ptr [rax]          ; !!!
00007FFEDF0A476E  movzx       eax,byte ptr [rax+1]        ; !!!
            return c.C0 - c.C1;                           
00007FFEDF0A4772  sub         edx,eax                     ; !!!
00007FFEDF0A4774  mov         eax,edx                     ; !!!
00007FFEDF0A4776  add         rsp,28h                     
00007FFEDF0A477A  ret                                     
00007FFEDF0A477B  call        00007FFF3EB57BE0            
00007FFEDF0A4780  int         3                           
; Run8                                                    
            var c = cards8[0];                            
00007FFEDF0B49A2  in          al,dx                       
00007FFEDF0B49A3  sub         byte ptr [rbx],dh           
00007FFEDF0B49A5  ror         byte ptr [rax-77h],44h      
00007FFEDF0B49A9  and         al,20h                      
00007FFEDF0B49AB  mov         rax,202902D0088h            
00007FFEDF0B49B5  mov         rax,qword ptr [rax]         
00007FFEDF0B49B8  cmp         dword ptr [rax+8],0         
00007FFEDF0B49BC  jbe         00007FFEDF0B49D8            
00007FFEDF0B49BE  mov         rax,qword ptr [rax+10h]     ; !!!
00007FFEDF0B49C2  mov         qword ptr [rsp+20h],rax     ; !!!
            return c.C0 - c.C1;                           
00007FFEDF0B49C7  movzx       eax,byte ptr [rsp+20h]      ; !!!
00007FFEDF0B49CC  movzx       edx,byte ptr [rsp+21h]      ; !!!
00007FFEDF0B49D1  sub         eax,edx                     
00007FFEDF0B49D3  add         rsp,28h                     
00007FFEDF0B49D7  ret                                     
00007FFEDF0B49D8  call        00007FFF3EB57BE0            
00007FFEDF0B49DD  int         3

As you can see, in the Run3 case, RyuJIT keeps the target bytes (C0, C1) in the edx, eax registers; in the Run8 case, RyuJIT keeps them on stack (qword ptr [rsp+20h]). Why? This may slightly degrade the performance of an application (see these benchmarks).

category:cq
theme:structs
skill-level:expert
cost:large
impact:large

The text was updated successfully, but these errors were encountered:

CarolEidt · 2016-08-22T16:32:05Z

This is due to a current limitation in struct promotion (aka scalar replacement) that restricts it to structs with no more than 4 fields.

CarolEidt · 2016-08-22T16:46:39Z

This is related to #6494, but as this is specific and has a specific test case, I'm going to rename this "Promote (scalar replace) structs with more than 4 fields".

AndreyAkinshin · 2016-08-22T16:53:00Z

@CarolEidt, thanks for the explanation!

RussKeldorph · 2016-08-22T17:14:07Z

@dotnet/jit-contrib

russellhadley · 2016-08-25T16:55:17Z

@AndreyAkinshin What's the impact of this? Is this benchmark represent an issue in a particular workload or product? Or is this more of a general investigation? I think this issue is important but I'm trying to get a handle on how quickly we'd need to look at this.

AndreyAkinshin · 2016-08-25T17:05:42Z

It's a general investigation, no real issues. =) I just thought that it would be nice to have such kind of optimizations.

Chicken-Bones · 2022-06-27T18:50:05Z

We've carefully crafted a struct Enumerator to fit within the current limitations, but the lack of ReadOnlySpan and the need to pack fields makes the code hard to maintain and interop with. It appears that struct-in-struct is supported, but because Span has 2 fields where an array is just a reference, it pushes past the 4 field limit.

https://github.com/tModLoader/tModLoader/blob/1.4/patches/tModLoader/Terraria/ModLoader/Core/HookList.cs#L14

Fragment:

public ref struct InstanceEnumerator
{
	// These have to be arrays rather than ReadOnlySpan as the JIT won't unpack/promote 'struct in struct' (or span in struct either)
	// Revisit with .NET 6 https://github.com/dotnet/runtime/issues/37924
	private readonly Instanced<T>[] instances;
	private readonly int[] hookIndices;

	// ideally this would be Instanced<T> and drop the need for the ii variable in the MoveNext function
	// but again, struct in struct promotion (and also increasing the 'field count'
	private T current;
	// i and j are combined into a 64bit variable beacuse JIT currently won't unpack/promote structs with > 4 fields into registers
	// See https://github.com/dotnet/runtime/issues/6534
	private ulong ij;
	//private int i;
	//private int j;
	
	...

jakobbotsch · 2023-04-25T10:18:44Z

Physical promotion (#83388) can potentially clean this up. With #85105 and DOTNET_JitEnablePhysicalPromotion=1, DOTNET_JitStressModeNames=STRESS_PHYSICAL_PROMOTION_COST, we get:

; Assembly listing for method Program:Run8():int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;* V00 loc0         [V00    ] (  0,  0   )  struct ( 8) zero-ref    do-not-enreg[SF]
;  V01 OutArgs      [V01    ] (  1,  1   )  struct (32) [rsp+00H]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;  V02 tmp1         [V02,T02] (  2,  2   )   ubyte  ->  rdx         single-def "V00.[000..001)"
;  V03 tmp2         [V03,T03] (  2,  2   )   ubyte  ->  rax         single-def "V00.[001..002)"
;  V04 tmp3         [V04,T00] (  3,  6   )   byref  ->  rax         single-def "Spilling address for field-by-field copy"
;  V05 tmp4         [V05,T01] (  3,  6   )     ref  ->  rax         single-def "arr expr"
;
; Lcl frame size = 40

G_M14509_IG01:
       sub      rsp, 40
                                                ;; size=4 bbWeight=1 PerfScore 0.25
G_M14509_IG02:
       mov      rax, 0xD1FFAB1E      ; data for Program:cards8
       mov      rax, gword ptr [rax]
       cmp      dword ptr [rax+08H], 0
       jbe      SHORT G_M14509_IG04
       add      rax, 16
       movzx    rdx, byte  ptr [rax]
       movzx    rax, byte  ptr [rax+01H]
       sub      edx, eax
       mov      eax, edx
                                                ;; size=34 bbWeight=1 PerfScore 11.00
G_M14509_IG03:
       add      rsp, 40
       ret
                                                ;; size=5 bbWeight=1 PerfScore 1.25
G_M14509_IG04:
       call     CORINFO_HELP_RNGCHKFAIL
       int3
                                                ;; size=6 bbWeight=0 PerfScore 0.00

; Total bytes of code 49, prolog size 4, PerfScore 17.40, instruction count 14, allocated bytes for code 49 (MethodHash=6250c752) for method Program:Run8():int
; ============================================================

Currently this promotion will not happen without disabling costing checks because the profitability heuristic does not take into account that promotion can potentially make some struct loads much cheaper (in this case the array access). Hence the heuristic believes that with only one access to each field, it is best left alone.

Another thing is that we can avoid spilling simple address computations like these when they are just ADD(LCL_VAR, CNS) to avoid this add rax, 16.

jakobbotsch · 2023-05-25T11:53:28Z

After #86660 we get the above codegen by default when physical promotion is enabled.

Sadly the add rax, 16 remains, even after #85889. The problem is that the operand is an INDEX_ADDR node that has not been expanded at this point to something recognizable by decomposition.

It also remains in Run3 even though it has been expanded at the point of block morphing. There the problem is that the address is a quite complicated piece of IR that the address peeling does not recognize:

               [000010] DACXGO-----                         ▌  STORE_LCL_VAR struct<Cards3, 3>(P) V00 loc0         
                                                            ▌    ubyte  V00.Cards3:C0 (offs=0x00) -> V02 tmp1         
                                                            ▌    ubyte  V00.Cards3:C1 (offs=0x01) -> V03 tmp2         
                                                            ▌    ubyte  V00.Cards3:C2 (offs=0x02) -> V04 tmp3          (last use)
               [000009] nACXG+-----                         └──▌  BLK       struct<Cards3, 3>
               [000034] -ACXG+-----                            └──▌  COMMA     byref 
               [000021] DACXG+-----                               ├──▌  STORE_LCL_VAR ref    V05 tmp4         
               [000006] --CXG+-----                               │  └──▌  COMMA     ref   
               [000005] H-CXG+-----                               │     ├──▌  CALL help long   CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
               [000003] -----+----- arg0 in rcx                   │     │  ├──▌  CNS_INT   long   0x7ff7ef1aff68
               [000004] -----+----- arg1 in rdx                   │     │  └──▌  CNS_INT   int    3
               [000001] I---G+-----                               │     └──▌  IND       ref   
               [000000] H----+-----                               │        └──▌  CNS_INT(h) long   0x1bc92001d08 static Fseq[cards3]
               [000033] ---X-+-----                               └──▌  COMMA     byref 
               [000026] ---X-+-----                                  ├──▌  BOUNDS_CHECK_Rng void  
               [000007] -----+-----                                  │  ├──▌  CNS_INT   int    0
               [000025] ---X-+-----                                  │  └──▌  ARR_LENGTH int   
               [000022] -----+-----                                  │     └──▌  LCL_VAR   ref    V05 tmp4         
               [000032] -----+-----                                  └──▌  ARR_ADDR  byref Cards3[]
               [000031] -----+-----                                     └──▌  ADD       byref 
               [000023] -----+-----                                        ├──▌  LCL_VAR   ref    V05 tmp4         
               [000030] -----+-----                                        └──▌  CNS_INT   long   16

I've opened #86755 for this.

AndreyAkinshin changed the title ~~Strange RyuJIT behaviour for aligned structures~~ Strange RyuJIT behaviour (structure memory access) Aug 22, 2016

CarolEidt changed the title ~~Strange RyuJIT behaviour (structure memory access)~~ Promote (scalar replace) structs with more than 4 fields Aug 22, 2016

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

jakobbotsch mentioned this issue Jul 1, 2022

Poor codegen for "big" (> 4 fields) structs init #71565

Closed

This was referenced Oct 12, 2022

JIT: Generalized struct promotion #76928

Closed

JIT struct work planned for .NET 8 #76931

Closed

jakobbotsch modified the milestones: Future, 8.0.0 Dec 15, 2022

jakobbotsch self-assigned this Dec 15, 2022

jakobbotsch mentioned this issue May 25, 2023

JIT: Expand address peeling in block morphing to handle array indexing expressions #86755

Open

jakobbotsch added the Priority:2 Work that is important, but not critical for the release label Jun 20, 2023

jakobbotsch mentioned this issue Jun 27, 2023

JIT: Enable physical promotion by default #88090

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 27, 2023

jakobbotsch closed this as completed in #88090 Jun 29, 2023

jakobbotsch closed this as completed in 9dcc7b1 Jun 29, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 29, 2023

ghost locked as resolved and limited conversation to collaborators Jul 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Promote (scalar replace) structs with more than 4 fields #6534

Promote (scalar replace) structs with more than 4 fields #6534

AndreyAkinshin commented Aug 22, 2016 •

edited by BruceForstall

Loading

CarolEidt commented Aug 22, 2016

CarolEidt commented Aug 22, 2016

AndreyAkinshin commented Aug 22, 2016

RussKeldorph commented Aug 22, 2016

russellhadley commented Aug 25, 2016

AndreyAkinshin commented Aug 25, 2016

Chicken-Bones commented Jun 27, 2022 •

edited

Loading

jakobbotsch commented Apr 25, 2023

jakobbotsch commented May 25, 2023 •

edited

Loading

Promote (scalar replace) structs with more than 4 fields #6534

Promote (scalar replace) structs with more than 4 fields #6534

Comments

AndreyAkinshin commented Aug 22, 2016 • edited by BruceForstall Loading

CarolEidt commented Aug 22, 2016

CarolEidt commented Aug 22, 2016

AndreyAkinshin commented Aug 22, 2016

RussKeldorph commented Aug 22, 2016

russellhadley commented Aug 25, 2016

AndreyAkinshin commented Aug 25, 2016

Chicken-Bones commented Jun 27, 2022 • edited Loading

jakobbotsch commented Apr 25, 2023

jakobbotsch commented May 25, 2023 • edited Loading

AndreyAkinshin commented Aug 22, 2016 •

edited by BruceForstall

Loading

Chicken-Bones commented Jun 27, 2022 •

edited

Loading

jakobbotsch commented May 25, 2023 •

edited

Loading