Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize scalar conversions with AVX512 #84384

Merged
merged 41 commits into from
Jul 16, 2023

Conversation

khushal1996
Copy link
Contributor

@khushal1996 khushal1996 commented Apr 5, 2023

This PR optimize the following cases:


Case Previous Code Optimized Instruction
ulong -> double vcvtsi2sd vcvtusi2sd
public static double UIntToDouble(UInt64 val)
{
    return (double)val;
}

Assembly before optimization

G_M33997_IG01:              ;; offset=0000H
       C5F877               vzeroupper 
						;; size=3 bbWeight=1 PerfScore 1.00
G_M33997_IG02:              ;; offset=0003H
       62F17C0857C0         vxorps   xmm0, xmm0
       62F1FF082AC1         vcvtsi2sd  xmm0, rcx
       4885C9               test     rcx, rcx
       7D0A                 jge      SHORT G_M33997_IG03
       62F1FF08580502000000 vaddsd   xmm0, qword ptr [reloc @RWD00]
						;; size=27 bbWeight=1 PerfScore 12.58
G_M33997_IG03:              ;; offset=001EH
       C3                   ret

Assembly after optimization

G_M33997_IG01:              ;; offset=0000H
       C5F877               vzeroupper 
						;; size=3 bbWeight=1 PerfScore 1.00
G_M33997_IG02:              ;; offset=0003H
       62F1FF087BC1         vcvtusi2sd xmm0, rcx
						;; size=6 bbWeight=1 PerfScore 4.00
G_M33997_IG03:              ;; offset=0009H
       C3                   ret
Case Previous Code Optimized Instruction
ulong -> float ulong->double->float vcvttusi2ss
[MethodImplAttribute(MethodImplOptions.NoInlining)]
public static float ConvUlongToFloat(ulong val)
{
    return (float)val;
}

Assembly before optimization

G_M2883_IG01:  ;; offset=0000H
       push     rbp
       sub      rsp, 48
       vzeroupper 
       lea      rbp, [rsp+30H]
       xor      eax, eax
       mov      dword ptr [rbp-04H], eax
       mov      qword ptr [rbp+10H], rcx
						;; size=22 bbWeight=1 PerfScore 5.00
G_M2883_IG02:  ;; offset=0016H
       cmp      dword ptr [(reloc 0x7ff8bc0d2898)], 0
       je       SHORT G_M2883_IG04
						;; size=9 bbWeight=1 PerfScore 4.00
G_M2883_IG03:  ;; offset=001FH
       call     CORINFO_HELP_DBG_IS_JUST_MY_CODE
						;; size=5 bbWeight=0.50 PerfScore 0.50
G_M2883_IG04:  ;; offset=0024H
       nop      
       mov      rax, qword ptr [rbp+10H]
       vcvtusi2sd xmm0, rax
       vcvtsd2ss xmm0, xmm0, xmm0
       vmovss   dword ptr [rbp-04H], xmm0
       nop      
						;; size=25 bbWeight=1 PerfScore 10.50
G_M2883_IG05:  ;; offset=003DH
       vmovss   xmm0, dword ptr [rbp-04H]
						;; size=7 bbWeight=1 PerfScore 3.00
G_M2883_IG06:  ;; offset=0044H
       add      rsp, 48
       pop      rbp
       ret      
						;; size=6 bbWeight=1 PerfScore 1.75

Assembly after optimization

G_M2883_IG01:  ;; offset=0000H
       push     rbp
       sub      rsp, 48
       vzeroupper 
       lea      rbp, [rsp+30H]
       xor      eax, eax
       mov      dword ptr [rbp-04H], eax
       mov      qword ptr [rbp+10H], rcx
						;; size=22 bbWeight=1 PerfScore 5.00
G_M2883_IG02:  ;; offset=0016H
       cmp      dword ptr [(reloc 0x7ff8b54b2898)], 0
       je       SHORT G_M2883_IG04
						;; size=9 bbWeight=1 PerfScore 4.00
G_M2883_IG03:  ;; offset=001FH
       call     CORINFO_HELP_DBG_IS_JUST_MY_CODE
						;; size=5 bbWeight=0.50 PerfScore 0.50
G_M2883_IG04:  ;; offset=0024H
       nop      
       mov      rax, qword ptr [rbp+10H]
       vcvtusi2ss xmm0, rax
       vmovss   dword ptr [rbp-04H], xmm0
       nop      
						;; size=19 bbWeight=1 PerfScore 8.50
G_M2883_IG05:  ;; offset=0037H
       vmovss   xmm0, dword ptr [rbp-04H]
						;; size=7 bbWeight=1 PerfScore 3.00
G_M2883_IG06:  ;; offset=003EH
       add      rsp, 48
       pop      rbp
       ret      
						;; size=6 bbWeight=1 PerfScore 1.75

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 5, 2023
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Apr 5, 2023
@ghost
Copy link

ghost commented Apr 5, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Draft PR for testing purposes. No need for review at this time.

This PR optimize the following cases:


Case Previous Code Optimized Instruction
float -> ulong CORINFO_HELP_DBL2ULNG Helper vcvttss2usi
public static UInt64 FloatToULong(float val)
{
    return (UInt64)val;
}

Assembly before optimization

G_M22196_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
       C5F877               vzeroupper 
						;; size=7 bbWeight=1 PerfScore 1.25
G_M22196_IG02:              ;; offset=0007H
       62F17E085AC0         vcvtss2sd xmm0, xmm0
       E87E57815E           call     CORINFO_HELP_DBL2ULNG
       90                   nop      
						;; size=12 bbWeight=1 PerfScore 5.25
G_M22196_IG03:              ;; offset=0013H
       4883C428             add      rsp, 40
       C3                   ret      
						;; size=5 bbWeight=1 PerfScore 1.25

Assembly afteroptimization

G_M22196_IG01:              ;; offset=0000H
       C5F877               vzeroupper 
						;; size=3 bbWeight=1 PerfScore 1.00
G_M22196_IG02:              ;; offset=0003H
       62F1FE0878C0         vcvttss2usi rax, xmm0
						;; size=6 bbWeight=1 PerfScore 6.00
G_M22196_IG03:              ;; offset=0009H
       C3                   ret

Case Previous Code Optimized Instruction
double -> ulong CORINFO_HELP_DBL2ULNG Helper vcvttsd2usi
public static UInt64 DoubleToULong(double val)
{
    return (UInt64)val;
}

Assembly before optimization

G_M30068_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
       C5F877               vzeroupper 
						;; size=7 bbWeight=1 PerfScore 1.25
G_M30068_IG02:              ;; offset=0007H
       E874577F5E           call     CORINFO_HELP_DBL2ULNG
       90                   nop      
						;; size=6 bbWeight=1 PerfScore 1.25
G_M30068_IG03:              ;; offset=000DH
       4883C428             add      rsp, 40
       C3                   ret

Assembly afteroptimization

G_M30068_IG01:              ;; offset=0000H
       C5F877               vzeroupper 
						;; size=3 bbWeight=1 PerfScore 1.00
G_M30068_IG02:              ;; offset=0003H
       62F1FF0878C0         vcvttsd2usi rax, xmm0
						;; size=6 bbWeight=1 PerfScore 5.00
G_M30068_IG03:              ;; offset=0009H
       C3                   ret

Case Previous Code Optimized Instruction
ulong -> double vcvtsi2sd vcvtusi2sd
public static double UIntToDouble(UInt64 val)
{
    return (double)val;
}

Assembly before optimization

G_M33997_IG01:              ;; offset=0000H
       C5F877               vzeroupper 
						;; size=3 bbWeight=1 PerfScore 1.00
G_M33997_IG02:              ;; offset=0003H
       62F17C0857C0         vxorps   xmm0, xmm0
       62F1FF082AC1         vcvtsi2sd  xmm0, rcx
       4885C9               test     rcx, rcx
       7D0A                 jge      SHORT G_M33997_IG03
       62F1FF08580502000000 vaddsd   xmm0, qword ptr [reloc @RWD00]
						;; size=27 bbWeight=1 PerfScore 12.58
G_M33997_IG03:              ;; offset=001EH
       C3                   ret

Assembly afteroptimization

G_M33997_IG01:              ;; offset=0000H
       C5F877               vzeroupper 
						;; size=3 bbWeight=1 PerfScore 1.00
G_M33997_IG02:              ;; offset=0003H
       62F1FF087BC1         vcvtusi2sd xmm0, rcx
						;; size=6 bbWeight=1 PerfScore 4.00
G_M33997_IG03:              ;; offset=0009H
       C3                   ret
Author: khushal1996
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@khushal1996 khushal1996 changed the title Avx512 scalar convert rebased Optimize scalar conversions with AVX512 Apr 5, 2023
@BruceForstall BruceForstall added the avx512 Related to the AVX-512 architecture label Apr 7, 2023
@khushal1996
Copy link
Contributor Author

khushal1996 commented Apr 7, 2023

I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.

@dotnet-policy-service agree company="Intel"

@khushal1996
Copy link
Contributor Author

We are narrowing down the scope of the PR to ulong -> float conversions. The changes will b e updated soon.

@@ -18,8 +18,46 @@ FORCEINLINE int64_t FastDbl2Lng(double val)
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file and in jithelpers.cpp should be rolled back too.

@khushal1996
Copy link
Contributor Author

@tannergooding @jkotas does this look good for merge? I have rolled back the changes for float/double -> I long and optimized ulong -> float/double cases.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. I have commented on a few nits.

Somebody on @dotnet/jit-contrib should do final review and merge.

src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/lowerxarch.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/instr.cpp Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
@khushal1996
Copy link
Contributor Author

@jkotas I think we are good to go here. Would you please help with the approval and merge.

@jkotas
Copy link
Member

jkotas commented Jul 5, 2023

@kunalspathak Could you please do final review and merge?

@khushal1996
Copy link
Contributor Author

@kunalspathak can you help to merge this changes? They have been approved and we are trying to get them in before the next release.

src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/morph.cpp Outdated Show resolved Hide resolved
@tannergooding
Copy link
Member

SPMI failures are the general No Azure Storage MCH files to download from cef79bc8-29bf-4f7b-9d05-9fc06832098c/osx/arm64/ impacting other PRs.

CI was passing before the minor formatting cleanup requested

@tannergooding tannergooding merged commit b99a279 into dotnet:main Jul 16, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Aug 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants