-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review the multi-op instruction usage for Arm64 #68028
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsAfter seeing the There is a set of
There is a set of
There is a set of
There are the
There are then some
Finally there is the "multiply high" instructions which can return just the upper bits of a wide multiply:
There may be other interesting instructions as well, but these are ones that may have broader usage/application and which likely be good to validate we are covering
|
These look like really good optimization opportunities. |
Thanks @tannergooding for compiling all these cases. Let me put this in the future item. |
@tannergooding and @kunalspathak : is there anyone looking at the remaining instructions? If not, then this could be a good set of items for @SwapnilGaikwad to work through. While doing this, it's probably worth adding some DisasmCheck tests for them too. |
@TIHan - are you planning to work on any of these? |
Sounds good to me. |
MNEG support: #79550 |
@a74nh - Are there more items that you or @SwapnilGaikwad will be working on? |
Hi @kunalspathak, we plan to look at the extended register instruction combinations next (add, adds, sub, subs, cmp and cmn). |
Sounds good. |
@TIHan - Could you please update the issue description on which instructions are you planning to work on? It will help @SwapnilGaikwad to pick some other instructions to optimize. |
* Contributes towards dotnet#68028 Change-Id: Ifad031a92270668bb3970fb1af817fda05851116
* contributes towards dotnet#68028
We should be able to tick off Looking at the
|
* Contributes to dotnet#68028
Agreed. This issue was intended for general codegen opportunities from the IR. Intrinsifying Int128.cs feels like it falls outside the scope. For this issue, if there's are no opportunities to lower general IR to the carry instructions, then we should consider those complete. (We could lower the Int128 + method to an add carry, but it'd require matching 6+ nodes, then a heavy graph rewrite, which isn't really practical) |
There are several general patterns that can be applicable to generate carry/borrow instructions and while One common general pattern for an
and for
Noting that this assumes an unsigned comparison in computing the carry/borrow but is the correct algorithm for both signed and unsigned integers. The same general pattern can also apply to overflow checking and so would be applicable for The pattern for detecting signed overflow for addition (rather than simply computing if a carry is needed; that is doing
This can be simplified down to a single compare as: Signed overflow for subtraction is similar, but doing |
Agreed with the patterns shown. However, it's still a few nodes that need matching. Do you think it's suitable to do in the lowering phase? |
Starting in lowering is probably a fine approach, but we should consider whether such patterns are likely to be broken by things like morph or cse due to the shape and higher expense they appear to have by the more complex IR. It might turn out to be better to recognize and transform it earlier instead, such as to use some dedicated node to help ensure it all flows smoothly. |
* arm64: Add support for Bitwise XOR NOT * contributes towards #68028 * arm64: Add support for Bitwise OR NOT * Update comments & add a tests that takes ints * Add swap tests that take a int & uint
… expressions Working towards dotnet#68028
After seeing the
msub
PR (#66621) which foldsmul, sub
into a singlemsub
I went and looked in the Arm64 manual for other interesting "combined operation" instructions.There is a set of
(shifted register)
instructions which can combine ashift, op
(the variants ending withs
also set flags):add
,adds
- Addsub
,subs
- Subtractcmp
- Compare #84605cmn
- Compare Negative #84667neg
,negs
- Negate #84667and
,ands
- Bitwise ANDbic
,bics
- Bitwise bit cleareon
- Bitwise exclusive OR NOTeor
- Bitwise exclusive ORorr
- Bitwise inclusive ORmvn
- Bitwise NOTorn
- Bitwise inclusive OR NOTtst
- Test Bitscbz
instead ofcmp #0
[arm64] optimize test for bound checks against a 0 index #42514There is a set of
(extended register)
instructions which can combine azero-extend, op
orsign-extend, op
:add
,adds
- Addsub
,subs
- Subtractcmp
- Comparecmn
- Compare NegativeThere is a set of
(carry)
instructions which can utilize the carry from a previous operation:adc
,adcs
- Add with carrysbc
,sbcs
- Subtract with carryngc
,ngcs
- Negate with carryThere are the
multiply integer
instructions which can combine anop, mul
:madd
- Multiply-addmsub
- Multiply-subtractmneg
- Multiply-negateThere are then some
long multiply
instructions which can return a product twice the size of the inputs (i32 * i32 = i64
or similar; effectively good for coveringzero or sign extend, multiply
):smull
,umull
- Multiply longsmaddl
,umaddl
- Multiply-add long Optimise long multiply + add/sub/neg on arm64. #91886smsubl
,umsubl
- Multiply-subtract long Optimise long multiply + add/sub/neg on arm64. #91886smnegl
,umnegl
- Multiply-negate long Optimise long multiply + add/sub/neg on arm64. #91886Finally there is the "multiply high" instructions which can return just the upper bits of a wide multiply:
smulh
,umulh
- Multiply HighThere may be other interesting instructions as well, but these are ones that may have broader usage/application and which likely be good to validate we are covering
category:implementation
theme:intrinsics
The text was updated successfully, but these errors were encountered: