Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT generates suboptimal code for value copy to pointer #11992

Open
kindermannhubert opened this issue Feb 6, 2019 · 8 comments
Open

JIT generates suboptimal code for value copy to pointer #11992

kindermannhubert opened this issue Feb 6, 2019 · 8 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@kindermannhubert
Copy link

kindermannhubert commented Feb 6, 2019

I believe all methods bellow should have same code generated by JIT. Only 'WriteShort' method looks fine.
Short/UShort/Char/MyStruct are all 2 bytes long and therefore all methods are equivalent.

using System;
using System.Runtime.InteropServices;

public unsafe class Test
{
    public void* p;
    
    public void WriteShort(short x)
    {
       *(short*)p = x;
    }
        
    public void WriteUShort(ushort x)
    {
       *(ushort*)p = x;
    }
    
    public void WriteChar(char x)
    {
       *(char*)p = x;
    }
    
    public void WriteShort_TakeRefAndDeref(short x)
    {
       *(short*)p = *&x;
    }
    
    public void WriteStruct_Direct(MyStruct x)
    {
       *(MyStruct*)p = x;
    }
    
    public void WriteStruct_ByShort(MyStruct x)
    {
       *(short*)p = *(short*)&x;
    }
    
    public void WriteStruct_ByUShort(MyStruct x)
    {
       *(ushort*)p = *(ushort*)&x;
    }
    
    //[StructLayout(LayoutKind.Sequential, Size = 2)] - doesn't matter
    public struct MyStruct
    {
        public byte a, b;
    }
}

There are unnecessary moves back and forth.

Test.WriteShort(Int16)
    L0000: mov rax, [rcx+0x8]
    L0004: mov [rax], dx
    L0007: ret

Test.WriteUShort(UInt16)
    L0000: mov rax, [rcx+0x8]
    L0004: movzx edx, dx // UNNECESSARY
    L0007: mov [rax], dx
    L000a: ret

Test.WriteChar(Char)
    L0000: mov rax, [rcx+0x8]
    L0004: movzx edx, dx // UNNECESSARY
    L0007: mov [rax], dx
    L000a: ret

Test.WriteShort_TakeRefAndDeref(Int16)
    L0000: mov [rsp+0x10], edx // UNNECESSARY
    L0004: mov rax, [rcx+0x8]
    L0008: movsx rdx, word [rsp+0x10] // UNNECESSARY
    L000e: mov [rax], dx
    L0011: ret

Test.WriteStruct_Direct(MyStruct)
    L0000: mov [rsp+0x10], rdx // UNNECESSARY
    L0005: mov rax, [rcx+0x8]
    L0009: movsx rdx, word [rsp+0x10] // UNNECESSARY
    L000f: mov [rax], dx
    L0012: ret

Test.WriteStruct_ByShort(MyStruct)
    L0000: mov [rsp+0x10], rdx // UNNECESSARY
    L0005: mov rax, [rcx+0x8]
    L0009: movsx rdx, word [rsp+0x10] // UNNECESSARY
    L000f: mov [rax], dx
    L0012: ret

Test.WriteStruct_ByUShort(MyStruct)
    L0000: mov [rsp+0x10], rdx // UNNECESSARY
    L0005: mov rax, [rcx+0x8]
    L0009: movzx edx, word [rsp+0x10] // UNNECESSARY
    L000e: mov [rax], dx
    L0011: ret

Can be seen at sharplab.io.

category:cq
theme:structs
skill-level:expert
cost:large
impact:small

@tannergooding
Copy link
Member

It is worth calling out that sharplab only shows disassembly for the full framework version of RyuJIT (and uses the legacy JIT for x86). This means it is decently outdated and is missing many of the optimizations/fixes that have been made in CoreCLR.

It would be worth validating which of these still produce unoptimal assembly with either the latest NetCore 3.0 preview or the current head of the master branch.

@mikedn
Copy link
Contributor

mikedn commented Feb 6, 2019

Current coreclr build generates:

; Assembly listing for method Test:WriteShort(short):this
G_M60373_IG01:
       0F1F440000           nop      
G_M60373_IG02:
       488B4108             mov      rax, qword ptr [rcx+8]
       668910               mov      word  ptr [rax], dx
G_M60373_IG03:
       C3                   ret      
; Total bytes of code 13, prolog size 5 for method Test:WriteShort(short):this
; ============================================================
; Assembly listing for method Test:WriteUShort(ushort):this
G_M11744_IG01:
       0F1F440000           nop      
G_M11744_IG02:
       488B4108             mov      rax, qword ptr [rcx+8]
       668910               mov      word  ptr [rax], dx
G_M11744_IG03:
       C3                   ret      
; Total bytes of code 13, prolog size 5 for method Test:WriteUShort(ushort):this
; ============================================================
; Assembly listing for method Test:WriteChar(ushort):this
G_M3391_IG01:
       0F1F440000           nop      
G_M3391_IG02:
       488B4108             mov      rax, qword ptr [rcx+8]
       668910               mov      word  ptr [rax], dx
G_M3391_IG03:
       C3                   ret      
; Total bytes of code 13, prolog size 5 for method Test:WriteChar(ushort):this
; ============================================================
; Assembly listing for method Test:WriteShort_TakeRefAndDeref(short):this
G_M9115_IG01:
       0F1F440000           nop      
       89542410             mov      dword ptr [rsp+10H], edx
G_M9115_IG02:
       488B4108             mov      rax, qword ptr [rcx+8]
       480FBF542410         movsx    rdx, word  ptr [rsp+10H]
       668910               mov      word  ptr [rax], dx
G_M9115_IG03:
       C3                   ret      
; Total bytes of code 23, prolog size 5 for method Test:WriteShort_TakeRefAndDeref(short):this
; ============================================================
; Assembly listing for method Test:WriteStruct_Direct(struct):this
G_M5762_IG01:
       0F1F440000           nop      
       4889542410           mov      qword ptr [rsp+10H], rdx
G_M5762_IG02:
       488B4108             mov      rax, qword ptr [rcx+8]
       480FBF542410         movsx    rdx, word  ptr [rsp+10H]
       668910               mov      word  ptr [rax], dx
G_M5762_IG03:
       C3                   ret      
; Total bytes of code 24, prolog size 5 for method Test:WriteStruct_Direct(struct):this
; ============================================================
; Assembly listing for method Test:WriteStruct_ByShort(struct):this
G_M41542_IG01:
       0F1F440000           nop      
       4889542410           mov      qword ptr [rsp+10H], rdx
G_M41542_IG02:
       488B4108             mov      rax, qword ptr [rcx+8]
       480FBF542410         movsx    rdx, word  ptr [rsp+10H]
       668910               mov      word  ptr [rax], dx
G_M41542_IG03:
       C3                   ret      
; Total bytes of code 24, prolog size 5 for method Test:WriteStruct_ByShort(struct):this
; ============================================================
; Assembly listing for method Test:WriteStruct_ByUShort(struct):this
G_M57683_IG01:
       0F1F440000           nop      
       4889542410           mov      qword ptr [rsp+10H], rdx
G_M57683_IG02:
       488B4108             mov      rax, qword ptr [rcx+8]
       0FB7542410           movzx    rdx, word  ptr [rsp+10H]
       668910               mov      word  ptr [rax], dx
G_M57683_IG03:
       C3                   ret      
; Total bytes of code 23, prolog size 5 for method Test:WriteStruct_ByUShort(struct):this

@RussKeldorph
Copy link
Contributor

@dotnet/jit-contrib

@mikedn
Copy link
Contributor

mikedn commented Feb 6, 2019

So WriteUShort and WriteChar (which I expect to be equivalent anyway) work fine now, the other still have CQ issue. I'm not sure what's the use case for something like *(ushort*)p = *(ushort*)&x; but it's probably reasonable to expect that at least *(MyStruct*)p = x; generates good code.

We start with reasonable IR:

               [000003] ------------              /--*  LCL_VAR   struct V01 arg1         
               [000005] -A-XG-------              *  ASG       struct (copy)
               [000004] ---XG-------              \--*  BLK(2)    struct
               [000002] ---XG-------                 \--*  FIELD     long   p
               [000001] ------------                    \--*  LCL_VAR   ref    V00 this         

and morph transforms it into problematic IR

fgMorphOneAsgBlock (after):
               [000012] n-----------              /--*  IND       short 
               [000011] ------------              |  \--*  ADDR      byref 
               [000003] -----+------              |     \--*  LCL_VAR   struct(AX) V01 arg1         
               [000005] -A-XG-------              *  ASG       short 
               [000004] *--XG+-N----              \--*  IND       short 
               [000002] ---XG+------                 \--*  IND       long  
               [000009] -----+------                    |  /--*  CNS_INT   long   8 field offset Fseq[p]
               [000010] -----+------                    \--*  ADD       byref 
               [000001] -----+------                       \--*  LCL_VAR   ref    V00 this         
 using oneAsgTree.

So this is basically another struct handling issue, different but related to #11940, #4323, #11848 and probably others that I don't remember.

It's not clear what a good solution would be. Probably either reinterpreting arg1 type in such a way that it does not have to be spilled to memory or change arg1's type to short and deal with individual struct field access by using shifts and masks.

@mikedn
Copy link
Contributor

mikedn commented Feb 6, 2019

WriteShort_TakeRefAndDeref is perhaps an interesting case - you'd thing that *&x would simply reduce to x. It sort of does that but in an unfortunate manner - using GT_LCL_FLD:

fgMorphTree BB01, stmt 1 (before)
               [000005] *--XG-------              /--*  IND       short 
               [000004] ------------              |  \--*  ADDR      long  
               [000003] ------------              |     \--*  LCL_VAR   int    V01 arg1         
               [000007] -A-XG-------              *  ASG       short 
               [000006] *--XG--N----              \--*  IND       short 
               [000002] ---XG-------                 \--*  FIELD     long   p
               [000001] ------------                    \--*  LCL_VAR   ref    V00 this         
GenTreeNode creates assertion:
               [000002] ---XG-------              *  IND       long  
In BB01 New Local Constant Assertion: V00 != null index=#01, mask=0000000000000001

Local V01 should not be enregistered because: was accessed as a local field

fgMorphTree BB01, stmt 1 (after)
               [000003] -----+------              /--*  LCL_FLD   short  V01 arg1         [+0]
               [000007] -A-XG+------              *  ASG       short 
               [000006] *--XG+-N----              \--*  IND       short 
               [000002] ---XG+------                 \--*  IND       long  
               [000011] -----+------                    |  /--*  CNS_INT   long   8 field offset Fseq[p]
               [000012] -----+------                    \--*  ADD       byref 
               [000001] -----+------                       \--*  LCL_VAR   ref    V00 this         

fgMorphTree BB01, stmt 2 (before)

@mikedn
Copy link
Contributor

mikedn commented Feb 6, 2019

The remaining two - WriteStruct_ByShort and WriteStruct_ByUShort are variations on the same theme - GT_LCL_FLD based reinterpretation:

               [000005] *--XG-------              /--*  IND       short 
               [000004] ------------              |  \--*  ADDR      long  
               [000003] ------------              |     \--*  LCL_VAR   struct V01 arg1         
; morphs into
               [000003] -----+------              /--*  LCL_FLD   short  V01 arg1         [+0]

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@CarolEidt
Copy link
Contributor

@sandreenko - I think this might have been fixed by your changes?

@jakobbotsch
Copy link
Member

jakobbotsch commented Dec 12, 2022

WriteStruct_ByShort and WriteStruct_ByUShort still have the unoptimal codegen:

; Method Test:WriteStruct_ByShort(Test+MyStruct):this
G_M56169_IG01:
       mov      qword ptr [rsp+10H], rdx
						;; size=5 bbWeight=1    PerfScore 1.00

G_M56169_IG02:
       mov      rax, qword ptr [rcx+08H]
       movsx    rdx, word  ptr [rsp+10H]
       mov      word  ptr [rax], dx
						;; size=13 bbWeight=1    PerfScore 6.00

G_M56169_IG03:
       ret      
						;; size=1 bbWeight=1    PerfScore 1.00
; Total bytes of code: 19

Fundamentally the JIT does not handle promoted structs "packed" in registers today when their fields do not map cleanly which likely would need to be resolved in some way to avoid LCL_FLD/DNER'ing the argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

No branches or pull requests

7 participants