-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LoongArch64] add Intrinsics' API for LoongArch64. #94400
base: main
Are you sure you want to change the base?
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics Issue DetailsWe have finished the SIMD on the runtime6.0 and the tests passed. I will push the SIMD for LoongArch64. This is the first PR about the API's name. @tannergooding
|
@tannergooding This is just the API's name, and first focus on the class name and the API name. Later I will update this PR to amend some details. Thanks |
As a new architecture, it's more risky to expose public APIs comparing to mature architectures. I'd suggest keeping them internal, and focusing on cross-platform Vector128/256 intrinsics now. |
...m.Private.CoreLib/src/System/Runtime/Intrinsics/LoongArch64/LA64Base.PlatformNotSupported.cs
Outdated
Show resolved
Hide resolved
For API names, you can open API proposal like #94011. API definition without JIT implementation should be unwanted. |
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/LoongArch64/LA_LASX.cs
Outdated
Show resolved
Hide resolved
If the API is OK for LoongArch64, I will push the JIT implementation. |
yes, the Vector128/256 is independent of the CPU. Now the API for architecture is the most important for LoongArch64, I want to confirm them for LoongArch64. |
#pragma warning disable IDE0060 // unused parameters | ||
using System.Runtime.CompilerServices; | ||
|
||
namespace System.Runtime.Intrinsics.LoongArch64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've marked this as NO-MERGE
since we cannot take it until after an API review has occurred. See https://github.com/dotnet/runtime/blob/main/docs/project/api-review-process.md
We need an API proposal, following the standard template, created first. We'll have the discussion on relevant name changes and other bits there, then I can then champion that and take it to API review. Once approved, we can then implement the API surface.
Until then, LoongArch would be relegated to only supporting the existing cross platform API surface. For example, Leading/TrailingZeroCount
can be supported by accelerating int.Leading/TrailingZeroCount
and the same methods on the other primitive types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks !
Reviewing the API for LoongArch64 based on a PR maybe more clear. So I pushed this PR.
I will create an API proposal for LoongArch64's API.
/// float64x4_t xvfmin_d_f64 (float64x4_t a, float64x4_t b) | ||
/// LASX: XVFMIN.D Xd.4D, Xj.4D, Xk.4D | ||
/// </summary> | ||
public static Vector256<double> Min(Vector256<double> left, Vector256<double> right) => Min(left, right); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the semantics around NaN
and -0
handling on LoongArch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The float operation is implemented within the IEEE-754-2008, here is MinNum(x,y).
/// float32x8_t xvfrecip_s_f32 (float32x8_t a) | ||
/// LASX: XVFRECIP.S Xd.8S Xj.8S | ||
/// </summary> | ||
public static Vector256<float> Reciprocal(Vector256<float> value) => Reciprocal(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this exact, or is it an estimate with more than 0.5 ULP error allowed, like on several other platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Reciprocal
is implemented with the IEEE754-2008 division(1.0,x).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the FRECIPE
and FRSQRTE
within the LoongArchBase
class are estimate.
But the FRECIP
and FRSQRT
are exact.
/// bool xvsetnez_v_u8 (uint8x32_t value) | ||
/// LASX: XVSETNEZ.V cd, Xj.32B | ||
/// </summary> | ||
public static bool HasElementsNotZero(Vector256<byte> value) => HasElementsNotZero(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this instruction work at the hardware level?
Xj.32B
is clearly the input register, but I'm not familiar with cd
here. Is it a general purpose register, a flag register, something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will answer these together later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this instruction work at the hardware level?
Xj.32B
is clearly the input register, but I'm not familiar withcd
here. Is it a general purpose register, a flag register, something else?
The cd
is a float flag register which indicating the floats comparing results.
There are 8 cd
float flag registers.
Of course here I didn't expose the cd
within the API just for simple usage.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/LoongArch64/LA64Base.cs
Outdated
Show resolved
Hide resolved
Update the API within the LoongArchBase class.
/// </summary> | ||
[Intrinsic] | ||
[CLSCompliant(false)] | ||
public abstract class LoongArchBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Rename this file as
LoongArchBase.cs
, is it OK?
Or Just name this file asLABase.cs
? - Naming this class as
LoongArchBase
, is it OK ?
public static int LeadingSignCount(int value) => LeadingSignCount(value); | ||
|
||
/// <summary> | ||
/// LA64: CLO.W rd, rj | ||
/// </summary> | ||
public static int LeadingSignCount(uint value) => LeadingSignCount(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it needed to add two types API with int value
and uint value
?
public static long ReverseElementBits(int value) => ReverseElementBits(value); | ||
|
||
/// <summary> | ||
/// LA64: BITREV.W rd, rj | ||
/// </summary> | ||
public static ulong ReverseElementBits(uint value) => ReverseElementBits(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it needed to add the int value
and uint value
for the API ReverseElementBits()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static int ReverseElementBits(int value) => ReverseElementBits(value); | ||
|
||
/// <summary> | ||
/// LA64: REVB.2W rd, rj | ||
/// </summary> | ||
public static uint ReverseElementBits(uint value) => ReverseElementBits(value); | ||
|
||
/// <summary> | ||
/// LA64: REVB.D rd, rj | ||
/// </summary> | ||
public static long ReverseElementBits(long value) => ReverseElementBits(value); | ||
|
||
/// <summary> | ||
/// LA64: REVB.D rd, rj | ||
/// </summary> | ||
public static ulong ReverseElementBits(ulong value) => ReverseElementBits(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are part of instructions liking the Arm64's REV, REV16, REV32, REV64
, but the ArmBase class
doesn't support these, Why?
Is it needed to add these for LoongArch64.
19f78ff
to
6e3a9e7
Compare
Sign-Zero-extend and MultiplyWiden
2a527ef
to
1e7203a
Compare
…unding. Count leading ones/zeros and elements' bit clear.
Add more ADD's operations.
bitwise shift, shuffle, compare and float operations.
add LoadElementReplicateVector, Vector elements' operations and AverageRounded.
008722b
to
8739f1b
Compare
Hi, @tannergooding |
424a8a1
to
411b9f5
Compare
411b9f5
to
6c7b380
Compare
I can potentially give it a pass today or tomorrow, but its still blocked until API review can happen. That probably won't happen until the new year as API review typically doesn't happen in December when most people are on holiday/vacation. |
OK, Thanks I will push other PRs that are independent of these APIs liking the SIMD's instructions within the emitter #95456 |
Also amend some code-formate.
b260975
to
952a76b
Compare
I'm still waiting for response to the question asked on the API proposal:
|
I'm very sorry for late response. Although the GCC had merged the LoongArch's SIMD. And the LLVM is same. There is an unofficial intrinsics manual: |
Thanks! This is still on my backlog but is lower priority than some other work due to the API review not having happened yet (and this PR being blocked until that can happen). I'll try to set some time aside in the next week or two to go through the SIMD ISA guide and compare it to the proposed API surface so that it can get marked |
OK, Thanks very much. |
We have finished the SIMD on the runtime6.0 and the tests passed.
I will push the SIMD for LoongArch64.
This is the first PR about the API's name.
The [API Proposal]: LoongArch64: add Intrinsics' API for LoongArch64
#94445
@tannergooding
Can you give me some advices ?
Thanks