Review the multi-op instruction usage for Arm64 #68028

tannergooding · 2022-04-14T16:57:59Z

dotnet-issue-labeler · 2022-04-14T16:58:03Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost · 2022-04-14T16:58:42Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

After seeing the msub PR (#66621) which folds mul, sub into a single msub I went and looked in the Arm64 manual for other interesting "combined operation" instructions.

There is a set of (shifted register) instructions which can combine a shift, op (the variants ending with s also set flags):

add, adds - Add
sub, subs - Subtract
cmp - Compare
cmn - Compare Negative
neg, negs - Negate
and, ands - Bitwise AND
bic, bics - Bitwise bit clear
eon - Bitwise exclusive OR NOT
eor - Bitwise exclusive OR
orr - Bitwise inclusive OR
mvn - Bitwise NOT
orn - Bitwise inclusive OR NOT
tst - Test Bits

There is a set of (extended register) instructions which can combine a zero-extend, op or sign-extend, op:

add, adds - Add
sub, subs - Subtract
cmp - Compare
cmn - Compare Negative

There is a set of (carry) instructions which can utilize the carry from a previous operation:

adc, adcs - Add with carry
sbc, sbcs - Subtract with carry
ngc, ngcs - Negate with carry

There are the multiply integer instructions which can combine an op, mul:

madd - Multiply-add
msub - Multiply-subtract
mneg - Multiply-negate

There are then some long multiply instructions which can return a product twice the size of the inputs (i32 * i32 = i64 or similar):

smull, umull - Multiply long
smaddl, umaddl - Multiply-add long
smsubl, umsubl - Multiply-subtract long
smnegl, umnegl - Multiply-negate long

Finally there is the "multiply high" instructions which can return just the upper bits of a wide multiply:

smulh, umulh - Multiply High

There may be other interesting instructions as well, but these are ones that may have broader usage/application and which likely be good to validate we are covering

Author:	tannergooding
Assignees:	-
Labels:	`arch-arm64`, `area-CodeGen-coreclr`, `untriaged`
Milestone:	-

TIHan · 2022-04-14T17:18:10Z

These look like really good optimization opportunities.

JulieLeeMSFT · 2022-04-18T23:41:51Z

Thanks @tannergooding for compiling all these cases. Let me put this in the future item.

a74nh · 2022-11-18T10:29:14Z

@tannergooding and @kunalspathak : is there anyone looking at the remaining instructions? If not, then this could be a good set of items for @SwapnilGaikwad to work through.

While doing this, it's probably worth adding some DisasmCheck tests for them too.

kunalspathak · 2022-11-18T18:05:59Z

@TIHan - are you planning to work on any of these?

kunalspathak · 2022-12-01T17:26:49Z

this could be a good set of items for @SwapnilGaikwad to work through.

Sounds good to me.

a74nh · 2022-12-19T16:47:33Z

MNEG support: #79550

kunalspathak · 2023-01-23T17:55:48Z

@a74nh - Are there more items that you or @SwapnilGaikwad will be working on?

SwapnilGaikwad · 2023-01-24T17:40:56Z

Hi @kunalspathak, we plan to look at the extended register instruction combinations next (add, adds, sub, subs, cmp and cmn).

kunalspathak · 2023-01-24T18:03:41Z

Hi @kunalspathak, we plan to look at the extended register instruction combinations next (add, adds, sub, subs, cmp and cmn).

Sounds good.

kunalspathak · 2023-04-24T15:03:02Z

@TIHan - Could you please update the issue description on which instructions are you planning to work on? It will help @SwapnilGaikwad to pick some other instructions to optimize.

* Contributes towards dotnet#68028 Change-Id: Ifad031a92270668bb3970fb1af817fda05851116

* contributes towards dotnet#68028

snickolls-arm · 2025-01-28T14:00:20Z

We should be able to tick off ADD/ADDS/SUB/SUBS (extended register) as they were included in #76273. The JIT is generating these correctly for additions with byte and short.

Looking at the (carry) set, it's not obvious which patterns they might be useful for as it's rare to need to carry on 64-bit. They would be useful for Int128 acceleration but this may be better as it's own issue, as it's a wider scope than just the JIT.

runtime/src/libraries/System.Private.CoreLib/src/System/Int128.cs

Line 692 in 1d1bf92

public static Int128 operator +(Int128 left, Int128 right)

* Contributes to dotnet#68028

* Contributes to #68028

a74nh · 2025-01-31T09:10:51Z

Looking at the (carry) set, it's not obvious which patterns they might be useful for as it's rare to need to carry on 64-bit. They would be useful for Int128 acceleration but this may be better as it's own issue, as it's a wider scope than just the JIT.

Agreed. This issue was intended for general codegen opportunities from the IR. Intrinsifying Int128.cs feels like it falls outside the scope.

For this issue, if there's are no opportunities to lower general IR to the carry instructions, then we should consider those complete. (We could lower the Int128 + method to an add carry, but it'd require matching 6+ nodes, then a heavy graph rewrite, which isn't really practical)

tannergooding · 2025-01-31T15:18:54Z

There are several general patterns that can be applicable to generate carry/borrow instructions and while Int128 is a primary case, it also applies to cases like BigInteger or various user defined types.

One common general pattern for an add; add with carry is:

T lower = left._lower + right._lower;
T carry = (lower < left._lower) ? 1: 0;
T upper = left._upper + right._upper + carry;

and for sub; sub with borrow:

T lower = left._lower - right._lower;
T borrow = (lower > left._lower) ? 1 : 0;
T upper = left._upper - right._upper - borrow;

Noting that this assumes an unsigned comparison in computing the carry/borrow but is the correct algorithm for both signed and unsigned integers. The same general pattern can also apply to overflow checking and so would be applicable for add; jump if unsigned overflow or similar.

The pattern for detecting signed overflow for addition (rather than simply computing if a carry is needed; that is doing add; jump if signed overflow) is a bit more complex, but roughly boils down to:

T result = left + right;

if ((left._upper ^ right._upper) >= 0) // signs of inputs match
{
    if ((result._upper ^ left._upper) < 0) // signs of inputs don't match the output
    {
        // overflow!
    }
}

This can be simplified down to a single compare as: (result._upper & ~(left._upper ^ right._upper)) < 0; which works because matching signs produce 0, while mismatching signs produce 1. We negate that then bitwise and it with the result sign. This gives us "false" for anything with mismatched signs or with matching signs for the inputs and the result.

Signed overflow for subtraction is similar, but doing (leftSign != rightSign) && (leftSign != resultSign)

a74nh · 2025-01-31T15:42:01Z

There are several general patterns that can be applicable to generate carry/borrow instructions

Agreed with the patterns shown. However, it's still a few nodes that need matching. Do you think it's suitable to do in the lowering phase?

tannergooding · 2025-01-31T16:32:31Z

Starting in lowering is probably a fine approach, but we should consider whether such patterns are likely to be broken by things like morph or cse due to the shape and higher expense they appear to have by the more complex IR. It might turn out to be better to recognize and transform it earlier instead, such as to use some dedicated node to help ensure it all flows smoothly.

* arm64: Add support for Bitwise XOR NOT * contributes towards #68028 * arm64: Add support for Bitwise OR NOT * Update comments & add a tests that takes ints * Add swap tests that take a int & uint

Contributes to dotnet#68028

… expressions Working towards dotnet#68028

) Contributes to #68028

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 14, 2022

tannergooding added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Apr 14, 2022

JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Apr 18, 2022

JulieLeeMSFT added this to the Future milestone Apr 18, 2022

JulieLeeMSFT mentioned this issue Apr 18, 2022

Improving ARM64 Performance in .NET 7.0 #64820

Closed

32 tasks

tannergooding mentioned this issue Sep 19, 2022

Support a few "shifted register" operations on Arm64 #75823

Merged

tannergooding mentioned this issue Sep 27, 2022

Remove GT_ADDEX and replace with more generalized containment handling #76273

Merged

xoofx mentioned this issue Nov 12, 2022

ARM64: Missing combine eor+lsr and duplicate constant reloads #78263

Closed

SwapnilGaikwad mentioned this issue Dec 12, 2022

Emit mneg for mul+neg on Arm64 #79550

Merged

kunalspathak mentioned this issue Jan 23, 2023

Improving Arm64 Performance in .NET 8.0 #77010

Closed

28 tasks

JulieLeeMSFT mentioned this issue Feb 8, 2023

What's new in .NET 8 Preview 1 dotnet/core#8133

Closed

3 tasks

kunalspathak assigned TIHan Apr 6, 2023

JulieLeeMSFT modified the milestones: Future, 8.0.0 Apr 6, 2023

TIHan mentioned this issue Apr 18, 2023

[JIT] ARM64 - Combine 'neg' and 'cmp' to 'cmn' #84667

Merged

TIHan modified the milestones: 8.0.0, 9.0.0 Jul 17, 2023

kunalspathak mentioned this issue Nov 7, 2023

Improving Arm64 Performance in .NET 9.0 #94464

Closed

13 tasks

TIHan added the Priority:3 Work that is nice to have label May 6, 2024

TIHan modified the milestones: 9.0.0, Future Jul 25, 2024

kunalspathak assigned a74nh and unassigned TIHan Nov 7, 2024

kunalspathak mentioned this issue Nov 8, 2024

Improve Arm64 Performance in .NET 10 #109652

Open

17 tasks

kunalspathak mentioned this issue Jan 15, 2025

arm64: Add bic(s) compact encoding #111452

Merged

snickolls-arm mentioned this issue Jan 24, 2025

Improve compare-and-branch sequences produced by Emitter #111797

Merged

jonathandavies-arm added a commit to jonathandavies-arm/runtime that referenced this issue Jan 27, 2025

arm64: Add support for Bitwise XOR NOT

ea3bb79

* Contributes towards dotnet#68028 Change-Id: Ifad031a92270668bb3970fb1af817fda05851116

kunalspathak mentioned this issue Jan 27, 2025

arm64: Add tests for add(s), and(s), sub(s), cmp, cmn, eor, neg & orr #111796

Merged

This was referenced Jan 28, 2025

Arm64: cmn & neg shifted register instructions are not generated for LSR #111888

Open

Arm64: ands & cmp shifted register instructions are not generated when a shift value greater than 63 is used #111889

Open

jonathandavies-arm added a commit to jonathandavies-arm/runtime that referenced this issue Jan 28, 2025

arm64: Add support for Bitwise XOR NOT

3b4225d

* contributes towards dotnet#68028

jonathandavies-arm mentioned this issue Jan 28, 2025

arm64: Add support for Bitwise OR NOT & XOR NOT #111893

Merged

jonathandavies-arm added a commit to jonathandavies-arm/runtime that referenced this issue Jan 28, 2025

arm64: Add support for bitwise NOT

704ee00

* Contributes to dotnet#68028

jonathandavies-arm mentioned this issue Jan 28, 2025

arm64: Add support for bitwise NOT #111904

Merged

kunalspathak pushed a commit that referenced this issue Jan 29, 2025

arm64: Add support for bitwise NOT (#111904)

3e32e51

* Contributes to #68028

jonathandavies-arm mentioned this issue Feb 6, 2025

arm64: Change EQ/NE node to SETCC if the operand supports the zero flag #112235

Merged

snickolls-arm added a commit to snickolls-arm/runtime that referenced this issue Feb 11, 2025

Add tests for add/sub (extended-register) combinations on ARM64

fe9d8a9

Contributes to dotnet#68028

snickolls-arm mentioned this issue Feb 11, 2025

Add tests for add/sub (extended-register) combinations on ARM64 #112408

Merged

snickolls-arm added a commit to snickolls-arm/runtime that referenced this issue Feb 11, 2025

Emit cmp (extended register) on ARM64 to simplify cast-then-compare…

2a98b71

… expressions Working towards dotnet#68028

snickolls-arm mentioned this issue Feb 11, 2025

Emit cmp (extended register) on ARM64 to simplify cast-then-compare expressions #112411

Open

kunalspathak pushed a commit that referenced this issue Feb 13, 2025

Add tests for add/sub (extended-register) combinations on ARM64 (#112408

d4f570d

) Contributes to #68028

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review the multi-op instruction usage for Arm64 #68028

Review the multi-op instruction usage for Arm64 #68028

tannergooding commented Apr 14, 2022 •

edited by kunalspathak

Loading

dotnet-issue-labeler bot commented Apr 14, 2022

ghost commented Apr 14, 2022

TIHan commented Apr 14, 2022

JulieLeeMSFT commented Apr 18, 2022

a74nh commented Nov 18, 2022

kunalspathak commented Nov 18, 2022

kunalspathak commented Dec 1, 2022

a74nh commented Dec 19, 2022

kunalspathak commented Jan 23, 2023

SwapnilGaikwad commented Jan 24, 2023

kunalspathak commented Jan 24, 2023

kunalspathak commented Apr 24, 2023

snickolls-arm commented Jan 28, 2025

a74nh commented Jan 31, 2025

tannergooding commented Jan 31, 2025 •

edited

Loading

a74nh commented Jan 31, 2025

tannergooding commented Jan 31, 2025

Review the multi-op instruction usage for Arm64 #68028

Review the multi-op instruction usage for Arm64 #68028

Comments

tannergooding commented Apr 14, 2022 • edited by kunalspathak Loading

dotnet-issue-labeler bot commented Apr 14, 2022

ghost commented Apr 14, 2022

TIHan commented Apr 14, 2022

JulieLeeMSFT commented Apr 18, 2022

a74nh commented Nov 18, 2022

kunalspathak commented Nov 18, 2022

kunalspathak commented Dec 1, 2022

a74nh commented Dec 19, 2022

kunalspathak commented Jan 23, 2023

SwapnilGaikwad commented Jan 24, 2023

kunalspathak commented Jan 24, 2023

kunalspathak commented Apr 24, 2023

snickolls-arm commented Jan 28, 2025

a74nh commented Jan 31, 2025

tannergooding commented Jan 31, 2025 • edited Loading

a74nh commented Jan 31, 2025

tannergooding commented Jan 31, 2025

tannergooding commented Apr 14, 2022 •

edited by kunalspathak

Loading

tannergooding commented Jan 31, 2025 •

edited

Loading