Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimally allow Vector.Shuffle to expand for values that become constant by global morph #102676

Closed
wants to merge 1 commit into from

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented May 24, 2024

This changes:

private static Vector128<float> Test()
{
    var y = Vector128.Create(3, 2, 1, 0);
    return Vector128.Shuffle(Vector128.Create(1.0f, 2.0f, 3.0f, 4.0f), y);
}

From generating:

; Method Program:Test():System.Runtime.Intrinsics.Vector128`1[float] (FullOpts)
G_M000_IG01:                ;; offset=0x0000
       push     rbx
       sub      rsp, 64
       mov      rbx, rcx

G_M000_IG02:                ;; offset=0x0008
       vmovups  xmm0, xmmword ptr [reloc @RWD00]
       vmovaps  xmmword ptr [rsp+0x30], xmm0
       vmovups  xmm0, xmmword ptr [reloc @RWD16]
       vmovaps  xmmword ptr [rsp+0x20], xmm0
       lea      rdx, [rsp+0x30]
       lea      r8, [rsp+0x20]
       mov      rcx, rbx
       call     [System.Runtime.Intrinsics.Vector128:Shuffle(System.Runtime.Intrinsics.Vector128`1[float],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[float]]
       mov      rax, rbx

G_M000_IG03:                ;; offset=0x003A
       add      rsp, 64
       pop      rbx
       ret      
RWD00  	dq	400000003F800000h, 4080000040400000h
RWD16  	dq	0000000200000003h, 0000000000000001h
; Total bytes of code: 64

to instead generate:

; Method Program:Test():System.Runtime.Intrinsics.Vector128`1[float] (FullOpts)
G_M40807_IG01:  ;; offset=0x0000
						;; size=0 bbWeight=1 PerfScore 0.00

G_M40807_IG02:  ;; offset=0x0000
       vpermilps xmm0, xmmword ptr [reloc @RWD00], 27
       vmovups  xmmword ptr [rcx], xmm0
       mov      rax, rcx
						;; size=17 bbWeight=1 PerfScore 4.25

G_M40807_IG03:  ;; offset=0x0011
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00
RWD00  	dq	400000003F800000h, 4080000040400000h
; Total bytes of code: 18

Due to limitations in forward sub, this does not handle some other cases where other statements interfere with the ability to substitute.

Doing this post global morph is much more difficult as the call gets rewritten to spilled locals, such as:

fgMorphTree BB01, STMT00002 (after)
               [000016] SACXG+-----                         *  CALL      void   System.Runtime.Intrinsics.Vector128:Shuffle(System.Runtime.Intrinsics.Vector128`1[float],System.Runtime.Intrinsics.Vector128`1[int]):System.Runtime.Intrinsics.Vector128`1[float]
               [000021] DA--------- arg1 setup              +--*  STORE_LCL_VAR simd16<System.Runtime.Intrinsics.Vector128`1>(AX) V04 tmp2         
               [000014] -----+-----                         |  \--*  HWINTRINSIC simd16 float Add
               [000012] -----+-----                         |     +--*  LCL_VAR   simd16 V03 tmp1         
               [000013] -----+-----                         |     \--*  LCL_VAR   simd16 V03 tmp1          (last use)
               [000022] ----------- arg1 in rdx             +--*  LCL_ADDR  long   V04 tmp2         [+0]
               [000015] -----+----- arg2 in r8              +--*  LCL_ADDR  long   V01 loc0         [+0]
               [000017] -----+----- retbuf in rcx           \--*  LCL_VAR   byref  V00 RetBuf       

This is basically the same reason we can't do what GT_INTRINSIC does by just carrying a GT_HWINTRINSIC down to rationalization and rewriting it back to a call. The ABI handling around return buffers and parameter passing happens very early today (return buffers around import and parameter passing in global morph). If the ABI handling were moved down, then we could move this logic later (such as post VN) and catch essentially all cases instead.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 24, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tannergooding
Copy link
Member Author

Ended up replacing this with #102702, which allows it to be done in rationalization instead and so can cover many more scenarios.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant