Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize new Vector3 { X=,Y=,Z=} pattern #84543

Closed
EgorBo opened this issue Apr 9, 2023 · 4 comments · Fixed by #86212
Closed

Optimize new Vector3 { X=,Y=,Z=} pattern #84543

EgorBo opened this issue Apr 9, 2023 · 4 comments · Fixed by #86212
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Apr 9, 2023

Vector3 Foo1() => new Vector3 { X = 1, Y = 2, Z = 3 };
Vector3 Foo2() => new Vector3(1, 2, 3);
// both are expected to emit the same codegen, but:

Codegen:

; Method Prog:Foo1():System.Numerics.Vector3:this
       C5F877               vzeroupper 
       C5F857C0             vxorps   xmm0, xmm0, xmm0
       C5F8100D00000000     vmovups  xmm1, xmmword ptr [reloc @RWD00]
       C4E37921C10E         vinsertps xmm0, xmm0, xmm1, 14
       C5F8100D00000000     vmovups  xmm1, xmmword ptr [reloc @RWD16]
       C4E37921C110         vinsertps xmm0, xmm0, xmm1, 16
       C5F8100D00000000     vmovups  xmm1, xmmword ptr [reloc @RWD32]
       C4E37921C120         vinsertps xmm0, xmm0, xmm1, 32
       C5F828C8             vmovaps  xmm1, xmm0
       C5F1C6C901           vshufpd  xmm1, xmm1, 1
       C3                   ret      
RWD00  	dq	3F8000003F800000h, 3F8000003F800000h
RWD16  	dq	4000000040000000h, 4000000040000000h
RWD32  	dq	4040000040400000h, 4040000040400000h
; Total bytes of code: 59


; Method Prog:Foo2():System.Numerics.Vector3:this
       C5F877               vzeroupper 
       C5F8100500000000     vmovups  xmm0, xmmword ptr [reloc @RWD00]
       C5F828C8             vmovaps  xmm1, xmm0
       C5F1C6C901           vshufpd  xmm1, xmm1, 1
       C3                   ret      
RWD00  	dq	400000003F800000h, DDDDDDDD40400000h
; Total bytes of code: 21

Shouldn't be too hard to optimze the 1st pattern, e.g. in morph/VN to recognize:

[000026] -----------                         \--*  HWINTRINSIC simd12 float WithElement
[000023] -----------                            +--*  HWINTRINSIC simd12 float WithElement
[000020] -----------                            |  +--*  HWINTRINSIC simd12 float WithElement
[000001] -----------                            |  |  +--*  CNS_VEC   simd12<0x00000000, 0x00000000, 0x00000000>
[000019] -----------                            |  |  +--*  CNS_INT   int    0
[000005] -----------                            |  |  \--*  CNS_DBL   float  1.0000000000000000
[000022] -----------                            |  +--*  CNS_INT   int    1
[000009] -----------                            |  \--*  CNS_DBL   float  2.0000000000000000
[000025] -----------                            +--*  CNS_INT   int    2
[000013] -----------                            \--*  CNS_DBL   float  3.0000000000000000

as GT_CNS_VEC. Can be a good first issue for JIT contributors.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Apr 9, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 9, 2023
@ghost
Copy link

ghost commented Apr 9, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details
Vector3 Foo1() => new Vector3 { X = 1, Y = 2, Z = 3 };
Vector3 Foo2() => new Vector3(1, 2, 3);
// both are expected to emit the same codegen, but:

Codegen:

; Method Prog:Foo1():System.Numerics.Vector3:this
       C5F877               vzeroupper 
       C5F857C0             vxorps   xmm0, xmm0, xmm0
       C5F8100D00000000     vmovups  xmm1, xmmword ptr [reloc @RWD00]
       C4E37921C10E         vinsertps xmm0, xmm0, xmm1, 14
       C5F8100D00000000     vmovups  xmm1, xmmword ptr [reloc @RWD16]
       C4E37921C110         vinsertps xmm0, xmm0, xmm1, 16
       C5F8100D00000000     vmovups  xmm1, xmmword ptr [reloc @RWD32]
       C4E37921C120         vinsertps xmm0, xmm0, xmm1, 32
       C5F828C8             vmovaps  xmm1, xmm0
       C5F1C6C901           vshufpd  xmm1, xmm1, 1
       C3                   ret      
RWD00  	dq	3F8000003F800000h, 3F8000003F800000h
RWD16  	dq	4000000040000000h, 4000000040000000h
RWD32  	dq	4040000040400000h, 4040000040400000h
; Total bytes of code: 59


; Method Prog:Foo2():System.Numerics.Vector3:this
       C5F877               vzeroupper 
       C5F8100500000000     vmovups  xmm0, xmmword ptr [reloc @RWD00]
       C5F828C8             vmovaps  xmm1, xmm0
       C5F1C6C901           vshufpd  xmm1, xmm1, 1
       C3                   ret      
RWD00  	dq	400000003F800000h, DDDDDDDD40400000h
; Total bytes of code: 21

Shouldn't be too hard to optimze the 1st pattern, e.g. in morph's fgOptimizeHWIntrinsic to recognize:

[000026] -----------                         \--*  HWINTRINSIC simd12 float WithElement
[000023] -----------                            +--*  HWINTRINSIC simd12 float WithElement
[000020] -----------                            |  +--*  HWINTRINSIC simd12 float WithElement
[000001] -----------                            |  |  +--*  CNS_VEC   simd12<0x00000000, 0x00000000, 0x00000000>
[000019] -----------                            |  |  +--*  CNS_INT   int    0
[000005] -----------                            |  |  \--*  CNS_DBL   float  1.0000000000000000
[000022] -----------                            |  +--*  CNS_INT   int    1
[000009] -----------                            |  \--*  CNS_DBL   float  2.0000000000000000
[000025] -----------                            +--*  CNS_INT   int    2
[000013] -----------                            \--*  CNS_DBL   float  3.0000000000000000

as GT_CNS_VEC.

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@EgorBo EgorBo added the help wanted [up-for-grabs] Good issue for external contributors label Apr 9, 2023
@EgorBo EgorBo added this to the Future milestone Apr 9, 2023
@EgorBo EgorBo removed the untriaged New issue has not been triaged by the area owner label Apr 9, 2023
@jasper-d
Copy link
Contributor

jasper-d commented May 1, 2023

I would like to give this a try. Can you assign me?

@EgorBo
Copy link
Member Author

EgorBo commented May 1, 2023

I would like to give this a try. Can you assign me?

@SkiFoD are you still working on this or we can re-assign?

@SkiFoD
Copy link
Contributor

SkiFoD commented May 1, 2023

@EgorBo Yeah sure, you can re-assign.

@EgorBo EgorBo modified the milestones: Future, 8.0.0 Jun 11, 2023
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 11, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 13, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants