Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Faster comparison against Vector128<>.Zero #63829

Closed
EgorBo opened this issue Jan 15, 2022 · 4 comments · Fixed by #65632
Closed

JIT: Faster comparison against Vector128<>.Zero #63829

EgorBo opened this issue Jan 15, 2022 · 4 comments · Fixed by #65632
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI good first issue Issue should be easy to implement, good for first-time contributors help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Jan 15, 2022

A good first issue for those who are interested in SIMD and ARM64. Should improve #63285

bool IsZero(Vector128<byte> v) => v == Vector128<byte>.Zero;

Currently is suboptimal on ARM64:
image

while I'd expect it to emit something like this:
image

so if one of the args of == is _Zero it needs to use umaxv, it should be done here:

comp->gtNewSimdHWIntrinsicNode(simdType, cmp, NI_AdvSimd_Arm64_MinAcross, CORINFO_TYPE_UBYTE, simdSize);

@EgorBo EgorBo added the tenet-performance Performance related issue label Jan 15, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Jan 15, 2022
@ghost
Copy link

ghost commented Jan 15, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details
bool IsZero(Vector128<byte> v) => v == Vector128<byte>.Zero;

Currently is suboptimal on ARM64:
image

while I'd expect it to emit something like this:
image

so if one of the args of == is _Zero it needs to use umaxv, it should be done here:

comp->gtNewSimdHWIntrinsicNode(simdType, cmp, NI_AdvSimd_Arm64_MinAcross, CORINFO_TYPE_UBYTE, simdSize);

Author: EgorBo
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr, untriaged

Milestone: -

@EgorBo EgorBo added this to the Future milestone Jan 15, 2022
@EgorBo EgorBo added good first issue Issue should be easy to implement, good for first-time contributors help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels Jan 15, 2022
@robertDurst
Copy link

robertDurst commented Jan 18, 2022

👀 looking, interested

@robertDurst
Copy link

@EgorBo what tool is that?

@echesakov
Copy link
Contributor

The code is slightly better after #62933

; Assembly listing for method Program:<<Main>$>g__IsZero|0_0(System.Runtime.Intrinsics.Vector128`1[Byte]):bool
; Emitting BLENDED_CODE for generic ARM64 CPU - MacOS
; optimized code
; fp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16)  single-def
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+00H]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M25750_IG01:              ;; offset=0000H
        00000000          stp     fp, lr, [sp,#-16]!
        00000000          mov     fp, sp
						;; bbWeight=1    PerfScore 1.50
G_M25750_IG02:              ;; offset=0008H
        00000000          cmeq    v16.16b, v0.16b, #0
        00000000          uminv   b16, v16.16b
        00000000          umov    w0, v16.b[0]
        00000000          cmp     w0, #0
        00000000          cset    x0, ne
						;; bbWeight=1    PerfScore 6.00
G_M25750_IG03:              ;; offset=001CH
        00000000          ldp     fp, lr, [sp],#16
        00000000          ret     lr
						;; bbWeight=1    PerfScore 2.00

; Total bytes of code 36, prolog size 8, PerfScore 13.10, instruction count 9, allocated bytes for code 36 (MethodHash=80c69b69) for method Program:<<Main>$>g__IsZero|0_0(System.Runtime.Intrinsics.Vector128`1[Byte]):bool
; ============================================================

cc @TIHan

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 20, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Feb 23, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
@JulieLeeMSFT JulieLeeMSFT modified the milestones: Future, 7.0.0 Apr 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI good first issue Issue should be easy to implement, good for first-time contributors help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants