-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mono] Implement AdvSimd #49260
[mono] Implement AdvSimd #49260
Conversation
1bef631
to
7d9469c
Compare
(Insert meaningful description here)
Remove `MonoLLVMModule::intrins_by_id`, which doesn't do anything other than serve as a lookup table for data contained in `intrins_id_to_intrins` Don't emit table-driven intrinsics when the corresponding intrinsic group isn't fully supported.
… ShiftArithmeticSaturateScalar, ShiftLeftLogicalSaturate and ShiftLeftLogicalSaturateScalar Fix ShiftLeftLogicalSaturate and ShiftLeftLogicalSaturateScalar: decompose it into a promotion of the second argument into a vector followed by an overloaded invocation of @llvm.aarch64.neon.uqshl or @llvm.aarch64.neon.sqshl
…teScalar ShiftLeftLogicalSaturateUnsignedScalar: move scalar-op-from-vector-op code into shared functions
46fccb6
to
12fd7c9
Compare
MultiplyDoublingSaturateHighScalar MultiplyDoublingScalarBySelectedScalarSaturateHigh MultiplyDoublingWideningSaturateScalarBySelectedScalar MultiplyDoublingWideningScalarBySelectedScalarAndAddSaturate MultiplyDoublingWideningScalarBySelectedScalarAndSubtractSaturate MultiplyRoundedDoublingByScalarSaturateHigh MultiplyRoundedDoublingBySelectedScalarSaturateHigh MultiplyRoundedDoublingSaturateHighScalar MultiplyRoundedDoublingScalarBySelectedScalarSaturateHigh - remove unnecessary special cases MultiplyDoublingWideningSaturateScalar - add support for the special-case scalar LLVM intrinsic for sqdmull
…pe when loading a single element
…num) to a separate header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this massive change!
@@ -303,6 +310,142 @@ static void create_aot_info_var (MonoLLVMModule *module); | |||
static void set_invariant_load_flag (LLVMValueRef v); | |||
static void set_nonnull_load_flag (LLVMValueRef v); | |||
|
|||
enum { | |||
INTRIN_scalar = 1 << 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any particular reason we are defining some of these with constant bit shifts, some with decimal literals, and some with hex literals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're hints to the reader: the enumeration constants given values by constant bit shifts are meant to be used as bit selectors in a bit set, the enumeration constants given values by decimal literals are meant to be used to bound loop ranges, and the enumeration constants given values by hex literals are meant to be used as logical masks.
…calar or scalar-in-vector return value in a Vector64 Remove OP_ARM64_ZERO_UPPER, which is unused
fe39968
to
a3f1171
Compare
… ops undef can apparently pass through intrinsic functions during optimization, so bias towards slightly worse but correct codegen for now
a3f1171
to
24a89e1
Compare
This change adds AdvSimd and AdvSimd.Arm64 support to LLVM-enabled Mono.
Most aarch64 LLVM intrinsic functions are overloaded and have names determined
by an invariant base string prepended to a string representation of one or two
type parameters. Intrinsic functions used by an LLVM module must have a
declaration somewhere in memory when JITting or somewhere in the output bitcode
file when AOTing. Currently Mono maintains a hash table that maps internal
intrinsic IDs to LLVM intrinsic declarations. These IDs have been extended: a
simplified type representation is added to the key's upper bits. This
representation is not especially compact, and currently uses 9 bits to label 18
states, but it's easy to look at in a debugger. (A simple base-18 encoding
could encode three parameters in 13 bits.)
These overload-tagged IDs can be passed to
OP_XOP_OVR{_,_SCALAR,_BYSCALAR}X_{X,X_X,X_X_X}
. The return type of theintrinsic that generates these mini ops is used to derive the overload tag to
find the corresponding LLVM intrinsic function declaration.
MonoLLVMModule::intrins_by_id
is removed, because LLVM intrinsic lookup keysare no longer small contiguous integers. It only seemed to serve as a lookup
table for data already contained in a hash table.
The corresponding instructions for some of these .NET-level intrinsics take
immediate parameters. For some of these instructions, the LLVM IR code that
selects these immediate-argument instructions can emit a fallback for
non-constant parameters, either by using an equivalent instruction with a
register operand or by using a longer and less-efficient instruction sequence.
For the rest, a branching code sequence is emitted. Helper functions
(
immediate_unroll_begin
etc.) are added to make this a little lessrepetitious.
Some operations take an immediate operand denoting a lane to select in a vector
before proceeding with another generic vector or scalar operation. These are
decomposed into a sequence of
OP_ARM64_SELECT_SCALAR
followed by thenon-lane-specific operation. LLVM can still optimize this to the lane-selecting
instruction when possible, and can generate fallback code for non-immediate
lane selection.
The tables describing the intrinsics supported by the runtime are extended to
support intrinsics with different target instructions for signed, unsigned and
floating point parameters. Whenever possible, .NET-level intrinsics that
correspond to a single LLVM intrinsic function are stored as a single entry in
these tables. Unfortunately many intrinsics need to be translated into a
sequence of LLVM IR operations; for these, new mini IR opcodes are added to
select the LLVM IR builder code that should run.