Implement BinaryPrimitives.ReverseEndianness for arm64 using rev #34617

EgorBo · 2020-04-07T01:12:56Z

@Gnbrkm41 noticed it's not currently optimized to rev.

static uint Test1(uint a) => BinaryPrimitives.ReverseEndianness(a);
static ulong Test2(ulong a) => BinaryPrimitives.ReverseEndianness(a);

Old codegen:

; Assembly listing for method Test1(int):int
G_M29289_IG01:
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
G_M29289_IG02:
        12009C01          and     w1, w0, #0xff00ff
        13812021          ror     w1, w1, #8
        12089C00          and     w0, w0, #0xff00ff00
        13806000          ror     w0, w0, #24
        0B000020          add     w0, w1, w0
G_M29289_IG03:
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr



; Assembly listing for method Test2(long):long
G_M43753_IG01:
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
G_M43753_IG02:
        2A0003E1          mov     w1, w0
        12009C22          and     w2, w1, #0xff00ff
        13822042          ror     w2, w2, #8
        12089C21          and     w1, w1, #0xff00ff00
        13816021          ror     w1, w1, #24
        0B010041          add     w1, w2, w1
        2A0103E1          mov     w1, w1
        D3607C21          lsl     x1, x1, #32
        D360FC00          lsr     x0, x0, #32
        12009C02          and     w2, w0, #0xff00ff
        13822042          ror     w2, w2, #8
        12089C00          and     w0, w0, #0xff00ff00
        13806000          ror     w0, w0, #24
        0B000040          add     w0, w2, w0
        2A0003E0          mov     w0, w0
        8B010000          add     x0, x0, x1
G_M43753_IG03:
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr

New codegen:

; Assembly listing for method Test1(int):int
G_M29289_IG01:
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
G_M29289_IG02:
        5AC00800          rev     w0, w0
G_M29289_IG03:
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr



; Assembly listing for method Test2(long):long
G_M43753_IG01:
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
G_M43753_IG02:
        DAC00C00          rev     x0, x0
G_M43753_IG03:
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr

int16 is not implemented yet, perhaps should be implemented in C# instead? To call the int-overload and apply a shift.

Tests already exist: https://github.com/dotnet/runtime/blob/master/src/coreclr/tests/src/JIT/Intrinsics/BinaryPrimitivesReverseEndianness.cs

tannergooding · 2020-04-07T05:51:12Z

int16 is not implemented yet, perhaps should be implemented in C# instead? To call the int-overload and apply a shift.

Would REV + shift or just a REV16 be better?

@TamarChristinaArm, could you advise? Is there also a good location to find this information or cycle approximations in general?

EgorBo · 2020-04-07T07:35:51Z

int16 is not implemented yet, perhaps should be implemented in C# instead? To call the int-overload and apply a shift.

Would REV + shift or just a REV16 be better?

@TamarChristinaArm, could you advise? Is there also a good location to find this information or cycle approximations in general?

Was confused by the clang's codegen: https://godbolt.org/z/w_8LUC

EgorBo · 2020-04-07T08:13:02Z

New codegen for ushort overload:

static ushort Foo1(ushort a) => BinaryPrimitives.ReverseEndianness(a);

; Assembly listing for method Program:Foo1(ushort):ushort
; Emitting BLENDED_CODE for generic ARM64 CPU - Unix
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  3   )  ushort  ->   x0        
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 0

G_M15320_IG01:
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
                                                ;; bbWeight=1    PerfScore 1.50
G_M15320_IG02:
        53003C00          uxth    w0, w0
        5AC00400          rev16   w0, w0
        53003C00          uxth    w0, w0
                                                ;; bbWeight=1    PerfScore 1.50
G_M15320_IG03:
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
                                                ;; bbWeight=1    PerfScore 2.00

; Total bytes of code 28, prolog size 8, PerfScore 7.80, (MethodHash=a63ec427) for method Program:Foo1(ushort):ushort
; ============================================================

EgorBo · 2020-04-07T08:43:04Z

Here is what Mono-LLVM-JIT (LLVM 6.0) emits on my Cortex A57 machine:

static ushort Foo1(ushort a) => BinaryPrimitives.ReverseEndianness(a);

define monocc i16 @"Program:Foo1 (uint16)"(i16 %arg_a) #0 {
BB0:
  %rev = call i16 @llvm.bswap.i16(i16 %arg_a)
  ret i16 %rev
}

   0:	5ac00808 	rev	w8, w0
   4:	53107d00 	lsr	w0, w8, #16
   8:	d65f03c0 	ret

TamarChristinaArm · 2020-04-07T12:30:22Z

int16 is not implemented yet, perhaps should be implemented in C# instead? To call the int-overload and apply a shift.

Would REV + shift or just a REV16 be better?

The REV16 is better, REV and REV16 have the same latency so adding a shift makes that sequence slower.

@TamarChristinaArm, could you advise? Is there also a good location to find this information or cycle approximations in general?

You can find these in the optimization guides, the one for Cortex-A55 is https://developer.arm.com/docs/epm128372/30/arm-cortex-a55-software-optimization-guide

EgorBo · 2020-04-13T09:05:49Z

I guess it's ready for review then

tannergooding · 2020-04-13T16:06:24Z

CC. @CarolEidt, @echesakovMSFT, @dotnet/jit-contrib

tannergooding

LGTM

echesakov

LGTM

src/coreclr/src/jit/codegenarmarch.cpp

EgorBo · 2020-04-18T12:34:48Z

Don't know how to restart the "in progress" leg 😕

sandreenko · 2020-04-19T01:41:17Z

AzDO shows that it has passed: https://dev.azure.com/dnceng/public/_build/results?buildId=605791&view=results

EgorBo · 2020-04-24T17:19:27Z

@BruceForstall can this be merged?

echesakov · 2020-04-24T17:23:15Z

It looks that we never gonna see runtime (Installer Build and Test Windows_NT_arm Debug) leg completed :)

Implement GT_BSWAP for arm64

bb58e54

Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 7, 2020

Add GT_BSWAP16

d48bd3c

tannergooding approved these changes Apr 13, 2020

View reviewed changes

echesakov approved these changes Apr 13, 2020

View reviewed changes

BruceForstall reviewed Apr 13, 2020

View reviewed changes

src/coreclr/src/jit/codegenarmarch.cpp Outdated Show resolved Hide resolved

EgorBo added 3 commits April 13, 2020 22:35

use TARGET_ARM64 in codegenarmarch.cpp

75fa0ed

Merge branch 'master' of github.com:dotnet/runtime into arm-bitreverse

e9b633d

Merge branch 'master' of github.com:dotnet/runtime into arm-bitreverse

4d951be

echesakov approved these changes Apr 24, 2020

View reviewed changes

BruceForstall approved these changes Apr 24, 2020

View reviewed changes

BruceForstall merged commit 89d08de into dotnet:master Apr 24, 2020

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement BinaryPrimitives.ReverseEndianness for arm64 using rev #34617

Implement BinaryPrimitives.ReverseEndianness for arm64 using rev #34617

EgorBo commented Apr 7, 2020 •

edited

Loading

tannergooding commented Apr 7, 2020

EgorBo commented Apr 7, 2020

EgorBo commented Apr 7, 2020

EgorBo commented Apr 7, 2020

TamarChristinaArm commented Apr 7, 2020

EgorBo commented Apr 13, 2020

tannergooding commented Apr 13, 2020

tannergooding left a comment

echesakov left a comment

EgorBo commented Apr 18, 2020

sandreenko commented Apr 19, 2020

EgorBo commented Apr 24, 2020

echesakov commented Apr 24, 2020

Implement BinaryPrimitives.ReverseEndianness for arm64 using rev #34617

Implement BinaryPrimitives.ReverseEndianness for arm64 using rev #34617

Conversation

EgorBo commented Apr 7, 2020 • edited Loading

tannergooding commented Apr 7, 2020

EgorBo commented Apr 7, 2020

EgorBo commented Apr 7, 2020

EgorBo commented Apr 7, 2020

TamarChristinaArm commented Apr 7, 2020

EgorBo commented Apr 13, 2020

tannergooding commented Apr 13, 2020

tannergooding left a comment

Choose a reason for hiding this comment

echesakov left a comment

Choose a reason for hiding this comment

EgorBo commented Apr 18, 2020

sandreenko commented Apr 19, 2020

EgorBo commented Apr 24, 2020

echesakov commented Apr 24, 2020

EgorBo commented Apr 7, 2020 •

edited

Loading