Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pi-Digits: Extra Struct copies of BigInteger #5785

Closed
swaroop-sridhar opened this issue May 4, 2016 · 5 comments
Closed

Pi-Digits: Extra Struct copies of BigInteger #5785

swaroop-sridhar opened this issue May 4, 2016 · 5 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization os-linux Linux OS (any supported distro) tenet-performance Performance related issue
Milestone

Comments

@swaroop-sridhar
Copy link
Contributor

RYUJit has several extra struct copies of the two field BigInteger struct in the pi-digit Benchmark's functions.

For example, here's the code for pidigits:extract_digit(int):int:this

G_M47731_IG01:
       57                   push     rdi
       56                   push     rsi
       53                   push     rbx
       4881EC80000000       sub      rsp, 128
       488BF1               mov      rsi, rcx
       488D7C2420           lea      rdi, [rsp+20H]                             // p = &this->BI1
       B918000000           mov      ecx, 24
       33C0                 xor      rax, rax
       F3AB                 rep stosd 
       488BCE               mov      rcx, rsi
       488BF1               mov      rsi, rcx

G_M47731_IG02:
       488D4E28             lea      rcx, bword ptr [rsi+40] 
       488B39               mov      rdi, gword ptr [rcx]     
       8B5908               mov      ebx, dword ptr [rcx+8]   
       33C9                 xor      rcx, rcx
       488D442440           lea      rax, bword ptr [rsp+40H]                    // t4 = 0
       C4E17957C0           vxorpd   ymm0, ymm0
       C4E17A7F00           vmovdqu  qword ptr [rax], ymm0
       488D4C2440           lea      rcx, bword ptr [rsp+40H] 
       E83AFEFFFF           call     System.Numerics.BigInteger:.ctor(int):this  // ctor(t4)
       488B4C2440           mov      rcx, gword ptr [rsp+40H] 
       8B542448             mov      edx, dword ptr [rsp+48H] 
       4C8D442470           lea      r8, bword ptr [rsp+70H]  
       488D442430           lea      rax, bword ptr [rsp+30H] 
       488938               mov      gword ptr [rax], rdi                        // t3.1 = p->1
       895808               mov      dword ptr [rax+8], ebx                      // t3.2 = p->2
       488D442420           lea      rax, bword ptr [rsp+20H] 
       488908               mov      gword ptr [rax], rcx                        // t2.1 = t4.1
       895008               mov      dword ptr [rax+8], edx                      // t2.2 = t4.2
       498BC8               mov      rcx, r8
       488D542430           lea      rdx, bword ptr [rsp+30H] 
       4C8D442420           lea      r8, bword ptr [rsp+20H]           // Multiply(t7 = t3 x t2)
       E874F7FFFF           call     System.Numerics.BigInteger:op_Multiply(struct,struct):struct 
       488D4C2460           lea      rcx, bword ptr [rsp+60H] 
       488D542430           lea      rdx, bword ptr [rsp+30H]
       4C8B442470           mov      r8, gword ptr [rsp+70H]    
       4C8902               mov      gword ptr [rdx], r8                         // t3.1 = t7.1
       448B442478           mov      r8d, dword ptr [rsp+78H]
       44894208             mov      dword ptr [rdx+8], r8d                      // t3.2 = t7.2
       488D5608             lea      rdx, bword ptr [rsi+8]                      // q = &this->BI2

G_M47731_IG03:
       C4E17A6F02           vmovdqu  ymm0, qword ptr [rdx]             
       C4E17A7F442420       vmovdqu  qword ptr [rsp+20H], ymm0                   // t2 = *q

G_M47731_IG04:
       488D542430           lea      rdx, bword ptr [rsp+30H]
       4C8D442420           lea      r8, bword ptr [rsp+20H]                // add(t6 = t3 + t2)
       E80AFBFFFF           call     System.Numerics.BigInteger:op_Addition(struct,struct):struct
       488D4C2450           lea      rcx, bword ptr [rsp+50H]  
       488D542430           lea      rdx, bword ptr [rsp+30H]
       4C8B442460           mov      r8, gword ptr [rsp+60H]    
       4C8902               mov      gword ptr [rdx], r8                         // t3.1 = t6.1
       448B442468           mov      r8d, dword ptr [rsp+68H]
       44894208             mov      dword ptr [rdx+8], r8d                      // t3.2 = t6.2
       4883C618             add      rsi, 24

G_M47731_IG05:
       C4E17A6F06           vmovdqu  ymm0, qword ptr [rsi]                       
       C4E17A7F442420       vmovdqu  qword ptr [rsp+20H], ymm0                   // t2 = this->BI3

G_M47731_IG06:
       488D542430           lea      rdx, bword ptr [rsp+30H]
       4C8D442420           lea      r8, bword ptr [rsp+20H]            // Division(t5 = t3 / t2)
       E8E8DEFFFF           call     System.Numerics.BigInteger:op_Division(struct,struct):struct
       488D4C2430           lea      rcx, bword ptr [rsp+30H]   
       488B442450           mov      rax, gword ptr [rsp+50H]
       488901               mov      gword ptr [rcx], rax                        // t3.1 = t5.1
       8B442458             mov      eax, dword ptr [rsp+58H]
       894108               mov      dword ptr [rcx+8], eax                      // t3.2 = t5.2
       488D4C2430           lea      rcx, bword ptr [rsp+30H]                    // Explicit(t3)
       E822DEFFFF           call     System.Numerics.BigInteger:op_Explicit(struct):int
       90                   nop      

G_M47731_IG07:
       4881C480000000       add      rsp, 128
       5B                   pop      rbx
       5E                   pop      rsi
       5F                   pop      rdi
       C3                   ret  

The comments indicate block struct copy/init as well as field assignments.
There are several extra copies in this code:

  • The copy of return value into t3 struct after each function is unnecessary. The return struct can be reused for next computation. This could be an isolated issue wrt Copy-prop of struct-return values that can be fixed. This accounts for three copies.
  • this->BI1 (via pointer p) could be block copied, but is field-copied
  • t2 = t4 copy is unnecessary -- could be caused by phase ordering where one of the structs is lowered early.
  • Is the zero initialization of t4 necessary before call to constructor?

category:cq
theme:structs
skill-level:expert
cost:extra-large

@mikedn
Copy link
Contributor

mikedn commented May 4, 2016

This could be an isolated issue wrt Copy-prop of struct-return values that can be fixed. This accounts for three copies.

The JIT doesn't do copy-prop for structs, see #4308

this->BI1 (via pointer p ) could be block copied, but is field-copied

Presumably by "block copy" you mean using a SSE move and not something like rep movsb?

t2 = t4 copy is unnecessary

Because t4 isn't used after the call? Yeah, that would be nice but that's a bit of fortunate case. In general the JIT has to make such copies when passing value types by value due to the rather unfortunate calling convention.

Is the zero initialization of t4 necessary before call to constructor?

It is because the JIT doesn't know that BigInteger:.ctor doesn't read any of the fields. Though the JIT doesn't eliminate zero init even in cases where the ctor is inlined, that's due to a general lack of optimization around value types in JIT.

@RussKeldorph
Copy link
Contributor

/cc @dotnet/jit-contrib

@sivarv
Copy link
Member

sivarv commented Jan 19, 2017

This is an issue on Linux codegen because of the way structs are treated.

@CarolEidt
Copy link
Contributor

In addition to the failure to eliminate struct copies as noted in #4308, this suffers from the following additional issue:

  • V07 is promoted, but is address-taken because it is passed to BigInteger:.ctor by reference.
  • This in turn means that copies involving v07 will be done field-by-field.
    Perhaps we can avoid promoting structs that are passed by reference (e.g. to a constructor).

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 30, 2020
@msftgits msftgits added this to the Future milestone Jan 30, 2020
@CarolEidt CarolEidt self-assigned this Oct 23, 2020
@CarolEidt
Copy link
Contributor

With the combination of improved inlining (which eliminates the address-taking of a struct passed to the constructor) and #39326 which keeps more reg-passed structs in registers, the extra copies have been eliminated. Note that this was referring to the older version of pi-digits which has since been eliminated (I had to go back to an old dotnet/coreclr version to find it).
The code shown here has 12 loads and 13 stores, while the current code generated has 7 loads and 2 stores.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization os-linux Linux OS (any supported distro) tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

6 participants