-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Improve Intel hardware intrinsic APIs #17637
Conversation
The new function names are horrible and in majority of cases do not follow .NET API coding guidelines by using acronyms. |
@CarolEidt @tannergooding @eerhardt @AndyAyersMS @mikedn @jkotas |
/// PCMPISTRI xmm, xmm/m128, imm8 | ||
/// int _mm_cmpistrs (__m128i a, __m128i b, const int imm8) | ||
/// </summary> | ||
public static bool CompareImplicitLengthNotCAndNotZ(Vector128<sbyte> left, Vector128<sbyte> right, StringComparisonMode mode) => CompareImplicitLengthNotCAndNotZ(left, right, mode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the words ImplicitLength
and ExplicitLength
in the method names? Isn't the fact that I used the overload without specifying the lengths enough to disambiguate that I wanted to use implicit lengths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Will try to eliminate these two words.
/// PCMPISTRI xmm, xmm/m128, imm8 | ||
/// int _mm_cmpistrz (__m128i a, __m128i b, const int imm8) | ||
/// </summary> | ||
public static bool CompareImplicitLengthNotCAndNotZ(Vector128<byte> left, Vector128<byte> right, StringComparisonMode mode) => CompareImplicitLengthNotCAndNotZ(left, right, mode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does C
and Z
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does C and Z mean?
What does O mean? Similiarly, what does the S suffix mean below?
They all represent the corresponding flags in the FLAGS register of x86/x64 CPUs.
don't use single letter abbreviations for prefixes/suffixes in names.
I agree, but now we have some similar APIs that "use single letter abbreviations", such as Sse41.TestC
, Avx.TestZ
, etc.
Do you have any suggestion to improve the names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume C
== Carry
, Z
== Zero
, O
== Overflow
, and S
== Sign
? Can we spell out the words? Or even better, if we could come up a more descriptive name of what the method does, that would be great.
Reading https://software.intel.com/sites/landingpage/IntrinsicsGuide/#!=undefined&text=_mm_cmpestr&expand=814,813,814,813, the only difference between S
and Z
that I can tell is which argument to check for a null character:
Compare packed strings in a and b with lengths la and lb using the control in imm8, and returns 1 if any character in a was null, and 0 otherwise.
vs.
Compare packed strings in a and b with lengths la and lb using the control in imm8, and returns 1 if any character in b was null, and 0 otherwise.
It seems the name should indicate/describe the differences, instead of using SFlag
or ZFlag
, and then forcing the caller to look up what SFlag
means.
/// int _mm_cmpistro (__m128i a, __m128i b, const int imm8) | ||
/// PCMPISTRI xmm, xmm/m128, imm8 | ||
/// </summary> | ||
public static bool CompareImplicitLengthO(Vector128<byte> left, Vector128<byte> right, StringComparisonMode mode) => CompareImplicitLengthO(left, right, mode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does O
mean? Similiarly, what does the S
suffix mean below? I agree with @4creators that we typically don't use single letter abbreviations for prefixes/suffixes in names.
@@ -544,243 +544,243 @@ public static class Avx2 | |||
/// __m128i _mm_i32gather_epi32 (int const* base_addr, __m128i vindex, const int scale) | |||
/// VPGATHERDD xmm, vm32x, xmm | |||
/// </summary> | |||
public static unsafe Vector128<int> GatherVector128(int* baseAddress, Vector128<int> index, byte scale) => GatherVector128(baseAddress, index, scale); | |||
public static unsafe Vector128<int> GatherVector128WithVector128Int32Indices(int* baseAddress, Vector128<int> index, byte scale) => GatherVector128WithVector128Int32Indices(baseAddress, index, scale); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I can imagine it being easier for JIT implementation, these names are a bit unwieldy from a user's point of view IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but if we only rely on overload to distinguish these intrinsics, the current named-intrinsic framework of RyuJIT has to be changed. Meanwhile, in the future, other platforms (e.g., .NET Native, Mono, etc) are also hard to port hardware intrinsic.
JIT experts may have some thoughts? @CarolEidt @mikedn @AndyAyersMS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JIT intrinsic plumbing needs to be able to deal with overloads. It does it today in several cases. For example, CORINFO_INTRINSIC_Abs
can be either Abs(float)
or Abs(double)
. Or there are multiple intrinsic overloads of Vector::.ctor
. If the named intrinsics are meant to be the one true way to identify intrinsics in future, it should be just fixed there.
Meanwhile, in the future, other platforms (e.g., .NET Native, Mono, etc) are also hard to port hardware intrinsic.
I do not think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current hardware intrinsics distinguish different overloads by "method name + base-type + SIMD size" in the IR. This mechanism works very well on most of the Intel hardware intrinsics, but it is not enough for Avx2.GatherVector*
.
If the API shape is more important than the complexity of JIT implementation, I would look into changing the JIT framework or intrinsic IR to address Avx2.GatherVector*
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem to me that it will be all that difficult to distinguish these, and given that it is not a common case, we could either choose to simply generate different NI*
intrinsic names in the importer, based on the type of the relevant argument, or we could postpone that analysis until codegen, and look at the base type of the SIMD argument to determine which instruction to generate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I would say that, in general, the API shape is more important than the complexity of the JIT implementation, as long as it is both "pay for play" (i.e. the complexity is only incurred for the case in question) and the complexity is not very high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will retain the current APIs and address in JIT.
02608d9
to
92e6ade
Compare
Re-improved SSE4.1 intrinsic APIs (e.g., |
@@ -283,7 +283,7 @@ public static class Sse41 | |||
/// __m128 _mm_insert_ps (__m128 a, __m128 b, const int imm8) | |||
/// INSERTPS xmm, xmm/m32, imm8 | |||
/// </summary> | |||
public static Vector128<float> Insert(Vector128<float> value, float data, byte index) => Insert(value, data, index); | |||
public static Vector128<float> Insert(Vector128<float> value, Vector128<float> data, byte index) => Insert(value, data, index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected the SSE4.1 Insert
API on float
base type.
Fix #18143
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have both overloads or will we specially handle the Sse.SetScalarVector128
and corresponding index
masks to not zero the upper bits of data
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, does the corresponding Avx API need to be updated as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have both overloads
Probably not, SSE4.1 insertps
has really special semantics that inserts a value selected from the second vector argument into the first vector. But other SSE4.1 "insert" instructions direct insert a scalar value. https://msdn.microsoft.com/en-us/library/bb514071(v=vs.120).aspx
Also, does the corresponding Avx API need to be updated as well?
No, AVX insert instructions are "normal" :)
Addressed the above feedback and fixed SSE4.1 |
/// PCMPISTRI xmm, xmm/m128, imm8 | ||
/// int _mm_cmpistrs (__m128i a, __m128i b, const int imm8) | ||
/// </summary> | ||
public static bool CompareNotCAndNotZFlag(Vector128<sbyte> left, Vector128<sbyte> right, StringComparisonMode mode) => CompareNotCAndNotZFlag(left, right, mode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case my original reply gets lost in the outdated
section, I've copied it here:
I assume C
== Carry
, Z
== Zero
, O
== Overflow
, and S
== Sign
? Can we spell out the words? Or even better, if we could come up a more descriptive name of what the method does, that would be great.
Reading https://software.intel.com/sites/landingpage/IntrinsicsGuide/#!=undefined&text=_mm_cmpestr&expand=814,813,814,813, the only difference between S
and Z
that I can tell is which argument to check for a null character:
Compare packed strings in a and b with lengths la and lb using the control in imm8, and returns 1 if any character in a was null, and 0 otherwise.
vs.
Compare packed strings in a and b with lengths la and lb using the control in imm8, and returns 1 if any character in b was null, and 0 otherwise.
It seems the name should indicate/describe the differences, instead of using SFlag
or ZFlag
, and then forcing the caller to look up what SFlag
means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume C == Carry, Z == Zero, O == Overflow, and S == Sign? Can we spell out the words?
@eerhardt Right, and I am OK to spell them out if it matches the C# naming convention better.
Or even better, if we could come up a more descriptive name of what the method does, that would be great.
I do not think so. "A more descriptive name" may be too long and not clear. For example, NotCAndNotZFlag
looks even clearer than "absolute value of lb is larger than or equal to MaxSize and the resulting mask is equal to zero". Meanwhile, we already assume that the users of HW intrinsics have been familar/mastering with the C++ counterparts or ISA design. So, I still suggest to name these intrinsics with flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we can come up with better names. It appears there are 5 cases we need to name:
Flag | Description | Possible Names |
---|---|---|
S & Z |
Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if any character in a was null, and 0 otherwise. & Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if any character in b was null, and 0 otherwise. |
CompareLeftContainsNull, CompareReturnLeftContainsNull, CompareReturnPartialLeft, CompareGetIsPartialLeft, CompareIsLeftPartial (Same for Right ) |
O | Compare packed strings with implicit lengths in a and b using the control in imm8, and returns bit 0 of the resulting bit mask. | CompareReturnFirstBit, CompareGetFirstBit |
C | Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if the resulting mask was non-zero, and 0 otherwise. | CompareIsResultNonZero, CompareHasNonZeroResult, CompareNonZeroMask, CompareHasResult |
A | Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if b did not contain a null character and the resulting mask was zero, and 0 otherwise. |
A
is definitely hard to describe succinctly, since it is the combination of two other ones. So coming up with a good name for it will probably be based on the two underlying names.
Note: The IntrinsicsGuide's description for the explicit length S
& Z
overloads (_mm_cmpestrs
& _mm_cmpestrz
) appears to be incorrect.
Compare packed strings in a and b with lengths la and lb using the control in imm8, and returns 1 if any character in b was null, and 0 otherwise.
But then the Operation says:
size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1
dst := (lb <= UpperBound)
Where the Operation appears to match what is in the underlying instruction manuals.
In light of the implicit vs explicit differences, to name S
& Z
, that's why I like something like Is Partial
because it describes at a higher-level what is being checked. ex. File.Exists
instead of File.StatReturnsZero
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eerhardt Thank you so much for the investigation. I read the manual again and I agree that "a more descriptive name" is very necessary because FLAGS is used in a non-standard manner with these SSE4.2 string instructions.
According to the above table you provided, there is a potential name matrix:
original names | new names |
---|---|
CompareCFlag | CompareNonZeroResultMask |
CompareZFlag | CompareRightContainsNull |
CompareSFlag | CompareLeftContainsNull |
CompareOFlag | CompareReturnFirstResultBit |
CompareNotCAndNotZFlag | CompareZeroResultMaskAndRightNotContainsNull |
Does it look good to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only issue I see if the difference between implicit and explicit length overloads. If we called the explicit length overloads ContainsNull
that would be incorrect. It instead returns if the rightLength
is less than the maximum (8 or 16). So we should use a different name for the explicit length overloads.
I'd like to keep them using the same name, if a decent name comes up. But if one isn't available, I'm fine with the implicit and explicit length overloads being named different (RightContainsNull
vs. RightLessThanMax
/RightIsPartial
/etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HasResult
/ContainsResult
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could make it more obvious by calling the Polarity
flags something like NegateResult
instead.
Reading if (ContainsMatch(op1, op2, StringComparisonMode.NegateResult))
seems much better (both readable and understandable) than if (CompareZeroResultMask(op1, op2, StringComparisonMode.NegatiePolarity))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do EndOfString, since that applies to both Explicit (where end of string is length determined) and Implicit (where end of string is determined by the null character)?
I think this is a good idea and I also perfer to use the same name for Explicit and Implicit.
How about Left/RightTerminated
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could make it more obvious by calling the Polarity flags something like NegateResult instead.
Good point! We can rename these two polarity flags both:
/// <summary>
/// _SIDD_NEGATIVE_POLARITY
/// </summary>
NegativeResult = 0x10,
/// <summary>
/// _SIDD_MASKED_NEGATIVE_POLARITY
/// </summary>
NegativeUsefulResult = 0x30,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the above renamed StringComparisonMode, we seems to have the new name matrix:
original names | new names |
---|---|
CompareCFlag | CompareHasMatch |
CompareZFlag | CompareRightTerminated |
CompareSFlag | CompareLeftTerminated |
CompareOFlag | CompareReturnFirstResultBit |
CompareNotCAndNotZFlag | CompareNoMatchAndRightNotTerminated |
It might be good to split this into two PRs. One fixing the argument types and the other fixing the SSE4.2 API names. Did we also already have the discussion on whether it was better for mask/control parameters to be the same as the base type or if it was better to be the same-sized integer type? CC. @eerhardt for input on the API side of things. |
I am not sure. I put them together just for simplifying the CoreFX/CoreCLR synchronization. |
Right, but it sounds like the Sse4.2 changes are a bit more complex, and should probably have a bit more "in-depth" review. I'm fine with keeping them together, if you prefer, but it might be easier to do two separate changes. |
@tannergooding Thanks for clarifying. I agree that Sse4.2 changes are a bit more complex and it may impact other API naming. For example, if we decide not using "flags" (e.g., I would split these changes if Sse41 blocks other work. |
|
||
/// <summary> | ||
/// __m128i _mm_cmpistrm (__m128i a, __m128i b, const int imm8) | ||
/// PCMPISTRM xmm, xmm/m128, imm8 | ||
/// </summary> | ||
public static Vector128<ushort> CompareImplicitLengthBitMask(Vector128<sbyte> left, Vector128<sbyte> right, StringComparisonMode mode) { throw new PlatformNotSupportedException(); } | ||
public static Vector128<ushort> CompareBitMask(Vector128<sbyte> left, Vector128<sbyte> right, StringComparisonMode mode) { throw new PlatformNotSupportedException(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my information - Why did we break out BitMask
vs UnitMask
into different overloads? Why not keep it in the StringComparisonMode
enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch! This is a legacy design from the previous versions, will fix. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my information - Why did we break out BitMask vs UnitMask into different overloads? Why not keep it in the StringComparisonMode enum?
To me, it seems wrong to have it in the StringComparisonMode
enum, since the values overlap.
/// <summary> | ||
/// _SIDD_UNIT_MASK | ||
/// </summary> | ||
UnitMask = 0x40, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are UnitMask
and MostSignificant
(or LeastSignificant
and BitMask
) distinguished? I don't think we should have multiple names for the same value in a single enum - are these used in different contexts? If so, I would think they'd be different enums.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these used in different contexts?
Yes.
UnitMask
and BitMask
is only used for CompareMask
(PCMPESTRM
).
LeastSignificant
and MostSignificant
is only used for CompareIndex
(PCMPESTRI
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, I would think they'd be different enums.
So, CompareMask
and CompareIndex
would have an additional imm parameter, which would complicate the JIT implementation (especially non-const fallback).
How about encoding these two flags into the intrinsic function name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about encoding these two flags into the intrinsic function name?
I don't think that's necessary - I just don't see why the enums should be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it depends on whether we want to map closely with the underlying hardware instruction, which I thought was a desire.
Section 4.1.5 Output Selection
in https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-2b-manual.pdf
Since it will be a Flags
enum (after this PR), I don't think this is a huge issue. It follows the underlying HW instruction design, and is understandable - use Least|Most Significant with the CompareIndex
method, use Bit|Unit Mask with the CompareMask
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rethinking the open a new issue
comment, I guess the current code has the Unit/BitMask
methods split, so this PR is already changing this API.
I dislike exploding the methods out because:
- It increases the number of methods, and method names in intellisense/docs.
- It doesn't allow for a "default". I either need to pick - Bit or Unit mask - up front. Using an enum allows me to use the API without making the choice. And then if that choice doesn't work for me, I can change the enum value.
I do like the advantage that 3 enums provides in that Bit/UnitMask
and Least/MostSignificant
values don't affect the methods that return bool
, so those methods don't even get those options. And also that CompareIndex doesn't get Bit/UnitMask
option and vice versa. So if we were going to change the current PR, that would be my vote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using an enum allows me to use the API without making the choice.
I'm not sure how valid this is. The choice can make a fairly big difference on how you write your algorithm, so you will generally need to make the choice and be aware of how it impacts your code, right out.
(You will also have to specify one of the two enum values either way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how valid this is.
It's completely valid when I'm exploring the API to learn how it works.
(You will also have to specify one of the two enum values either way)
Only in the 3 enums design. As currently proposed I wouldn't need to specify Bit/UnitMask
when calling CompareMask
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As currently proposed I wouldn't need to specify Bit/UnitMask when calling CompareMask.
I'm not convinced that's a good thing 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, these APIs and instructions are already fairly hard to understand. Making parts of it more obvious, even if it results in us exposing double the methods might be a good thing (It looks like we have ~50 methods exposed right now).
At this point, I would propose changing this PR so that it doesn't change the enum names or usage (i.e. only do changes 1 and 3. |
Ok, let me separate SSE4.2 changes into a new PR. |
@@ -182,12 +182,12 @@ public static class Sse41 | |||
/// int _mm_extract_epi8 (__m128i a, const int imm8) | |||
/// PEXTRB reg/m8, xmm, imm8 | |||
/// </summary> | |||
public static sbyte Extract(Vector128<sbyte> value, byte index) => Extract(value, index); | |||
public static int Extract(Vector128<sbyte> value, byte index) => Extract(value, index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also fixes SSE2/SSE4.1/AVX Extract
return type to reflect the underlying instruction behavior https://github.com/dotnet/coreclr/issues/17957
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks @fiigii for your patience and persistence!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Couple minor questions.
e7e8f0c
to
0157aab
Compare
Store(buffer, value); | ||
return buffer[index]; | ||
} | ||
return Unsafe.Add<byte>(ref Unsafe.As<Vector256<byte>, byte>(ref value), index & 0x1F); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed sbyte
and short
overloads of SSE2/SSE4.1/AVX Extract
and simplify Avx.Extract
non-const fallback as @jkotas's suggestion.
I would prepare the CoreFX counterpart if this PR looks good to you guys.
@eerhardt Can we merge this PR? |
Oops - I thought this was already merged. Merging now. |
* Improve Intel hardware intrinsic APIs * Simplify Avx.Extract non-const fallback Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
* Improve Intel hardware intrinsic APIs * Simplify Avx.Extract non-const fallback Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Improve Intel hardware intrinsic APIs * Simplify Avx.Extract non-const fallback Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
* Improve Intel hardware intrinsic APIs * Simplify Avx.Extract non-const fallback Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
This will disable the test from being run with smart in our windows arm testing. This corresponds to the tests being deleted in dotnet#17637.
This will disable the test from being run with smart in our windows arm testing. This corresponds to the tests being deleted in #17637.
* Improve Intel hardware intrinsic APIs * Simplify Avx.Extract non-const fallback Commit migrated from dotnet/coreclr@ea58e86
) This will disable the test from being run with smart in our windows arm testing. This corresponds to the tests being deleted in dotnet/coreclr#17637. Commit migrated from dotnet/coreclr@b55d2b8
This PR
encodes the result flags in the names of certain SSE4.2 string processing intrinsic. That provides more stable runtime behaviors and simplifies JIT implementation, discussed in https://github.com/dotnet/coreclr/issues/16270Insert
API onfloat
base type, closes https://github.com/dotnet/coreclr/issues/18143.Extract
return type to reflect the underlying instruction behavior, closes https://github.com/dotnet/coreclr/issues/17957@CarolEidt @tannergooding @eerhardt