Improve Intel hardware intrinsic APIs #17637

fiigii · 2018-04-18T09:17:11Z

This PR

fixes https://github.com/dotnet/coreclr/issues/17058 and temporarily disables AVX MaskLoad test cases.
encodes the result flags in the names of certain SSE4.2 string processing intrinsic. That provides more stable runtime behaviors and simplifies JIT implementation, discussed in https://github.com/dotnet/coreclr/issues/16270
Fixes SSE4.1 Insert API on float base type, closes https://github.com/dotnet/coreclr/issues/18143.
Fixes SSE2/SSE4.1/AVX Extract return type to reflect the underlying instruction behavior, closes https://github.com/dotnet/coreclr/issues/17957

4creators · 2018-04-19T14:01:06Z

The new function names are horrible and in majority of cases do not follow .NET API coding guidelines by using acronyms.

fiigii · 2018-04-27T18:03:10Z

@CarolEidt @tannergooding @eerhardt @AndyAyersMS @mikedn @jkotas
Could you please take a look at this PR and give suggestions? The following intrinsic work depends on the API change.

eerhardt · 2018-04-30T15:13:49Z

src/mscorlib/src/System/Runtime/Intrinsics/X86/Sse42.cs

-        ///   PCMPISTRI xmm, xmm/m128, imm8
-        /// int _mm_cmpistrs (__m128i a, __m128i b, const int imm8)
+        /// </summary>
+        public static bool CompareImplicitLengthNotCAndNotZ(Vector128<sbyte> left, Vector128<sbyte> right, StringComparisonMode mode) => CompareImplicitLengthNotCAndNotZ(left, right, mode);


Do we really need the words ImplicitLength and ExplicitLength in the method names? Isn't the fact that I used the overload without specifying the lengths enough to disambiguate that I wanted to use implicit lengths?

Good point! Will try to eliminate these two words.

eerhardt · 2018-04-30T15:14:07Z

src/mscorlib/src/System/Runtime/Intrinsics/X86/Sse42.cs

        ///   PCMPISTRI xmm, xmm/m128, imm8
-        /// int _mm_cmpistrz (__m128i a, __m128i b, const int imm8)
+        /// </summary>
+        public static bool CompareImplicitLengthNotCAndNotZ(Vector128<byte> left, Vector128<byte> right, StringComparisonMode mode) => CompareImplicitLengthNotCAndNotZ(left, right, mode);


What does C and Z mean?

What does C and Z mean?
What does O mean? Similiarly, what does the S suffix mean below?

They all represent the corresponding flags in the FLAGS register of x86/x64 CPUs.

don't use single letter abbreviations for prefixes/suffixes in names.

I agree, but now we have some similar APIs that "use single letter abbreviations", such as Sse41.TestC, Avx.TestZ, etc.
Do you have any suggestion to improve the names?

I assume C == Carry, Z == Zero, O == Overflow, and S == Sign? Can we spell out the words? Or even better, if we could come up a more descriptive name of what the method does, that would be great.

Reading https://software.intel.com/sites/landingpage/IntrinsicsGuide/#!=undefined&text=_mm_cmpestr&expand=814,813,814,813, the only difference between S and Z that I can tell is which argument to check for a null character:

Compare packed strings in a and b with lengths la and lb using the control in imm8, and returns 1 if any character in a was null, and 0 otherwise.

vs.

Compare packed strings in a and b with lengths la and lb using the control in imm8, and returns 1 if any character in b was null, and 0 otherwise.

It seems the name should indicate/describe the differences, instead of using SFlag or ZFlag, and then forcing the caller to look up what SFlag means.

eerhardt · 2018-04-30T15:15:13Z

src/mscorlib/src/System/Runtime/Intrinsics/X86/Sse42.cs

+        /// int _mm_cmpistro (__m128i a, __m128i b, const int imm8)
+        ///   PCMPISTRI xmm, xmm/m128, imm8
+        /// </summary>
+        public static bool CompareImplicitLengthO(Vector128<byte> left, Vector128<byte> right, StringComparisonMode mode) => CompareImplicitLengthO(left, right, mode);


What does O mean? Similiarly, what does the S suffix mean below? I agree with @4creators that we typically don't use single letter abbreviations for prefixes/suffixes in names.

eerhardt · 2018-04-30T15:17:26Z

src/mscorlib/src/System/Runtime/Intrinsics/X86/Avx2.cs

@@ -544,243 +544,243 @@ public static class Avx2
        /// __m128i _mm_i32gather_epi32 (int const* base_addr, __m128i vindex, const int scale)
        ///   VPGATHERDD xmm, vm32x, xmm
        /// </summary>
-        public static unsafe Vector128<int> GatherVector128(int* baseAddress, Vector128<int> index, byte scale) => GatherVector128(baseAddress, index, scale);
+        public static unsafe Vector128<int> GatherVector128WithVector128Int32Indices(int* baseAddress, Vector128<int> index, byte scale) => GatherVector128WithVector128Int32Indices(baseAddress, index, scale);


While I can imagine it being easier for JIT implementation, these names are a bit unwieldy from a user's point of view IMO.

Yes, but if we only rely on overload to distinguish these intrinsics, the current named-intrinsic framework of RyuJIT has to be changed. Meanwhile, in the future, other platforms (e.g., .NET Native, Mono, etc) are also hard to port hardware intrinsic.

JIT experts may have some thoughts? @CarolEidt @mikedn @AndyAyersMS

The JIT intrinsic plumbing needs to be able to deal with overloads. It does it today in several cases. For example, CORINFO_INTRINSIC_Abs can be either Abs(float) or Abs(double). Or there are multiple intrinsic overloads of Vector::.ctor. If the named intrinsics are meant to be the one true way to identify intrinsics in future, it should be just fixed there.

Meanwhile, in the future, other platforms (e.g., .NET Native, Mono, etc) are also hard to port hardware intrinsic.

I do not think so.

The current hardware intrinsics distinguish different overloads by "method name + base-type + SIMD size" in the IR. This mechanism works very well on most of the Intel hardware intrinsics, but it is not enough for Avx2.GatherVector*.

If the API shape is more important than the complexity of JIT implementation, I would look into changing the JIT framework or intrinsic IR to address Avx2.GatherVector*.

It doesn't seem to me that it will be all that difficult to distinguish these, and given that it is not a common case, we could either choose to simply generate different NI* intrinsic names in the importer, based on the type of the relevant argument, or we could postpone that analysis until codegen, and look at the base type of the SIMD argument to determine which instruction to generate.

And I would say that, in general, the API shape is more important than the complexity of the JIT implementation, as long as it is both "pay for play" (i.e. the complexity is only incurred for the case in question) and the complexity is not very high.

Thanks, will retain the current APIs and address in JIT.

fiigii · 2018-05-02T17:39:44Z

Re-improved SSE4.1 intrinsic APIs (e.g., CompareImplicitLengthC -> CompareCFlag) based on above suggestions.

@CarolEidt @eerhardt @tannergooding

fiigii · 2018-06-11T18:22:11Z

src/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/Sse41.cs

@@ -283,7 +283,7 @@ public static class Sse41
        /// __m128 _mm_insert_ps (__m128 a, __m128 b, const int imm8)
        ///   INSERTPS xmm, xmm/m32, imm8
        /// </summary>
-        public static Vector128<float> Insert(Vector128<float> value, float data, byte index) => Insert(value, data, index);
+        public static Vector128<float> Insert(Vector128<float> value, Vector128<float> data, byte index) => Insert(value, data, index);


Corrected the SSE4.1 Insert API on float base type.

Fix #18143

Should we have both overloads or will we specially handle the Sse.SetScalarVector128 and corresponding index masks to not zero the upper bits of data?

Also, does the corresponding Avx API need to be updated as well?

Should we have both overloads

Probably not, SSE4.1 insertps has really special semantics that inserts a value selected from the second vector argument into the first vector. But other SSE4.1 "insert" instructions direct insert a scalar value. https://msdn.microsoft.com/en-us/library/bb514071(v=vs.120).aspx

Also, does the corresponding Avx API need to be updated as well?

No, AVX insert instructions are "normal" :)

fiigii · 2018-06-11T18:29:36Z

Addressed the above feedback and fixed SSE4.1 Insert API (temporarily disabled its tests).
@CarolEidt @tannergooding @eerhardt Could you please take a look when you get a chance? This change is blocking some HW intrinsic work.

eerhardt · 2018-06-11T19:01:34Z

src/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/Sse42.cs

-        ///   PCMPISTRI xmm, xmm/m128, imm8
-        /// int _mm_cmpistrs (__m128i a, __m128i b, const int imm8)
+        /// </summary>
+        public static bool CompareNotCAndNotZFlag(Vector128<sbyte> left, Vector128<sbyte> right, StringComparisonMode mode) => CompareNotCAndNotZFlag(left, right, mode);


In case my original reply gets lost in the outdated section, I've copied it here:

I assume C == Carry, Z == Zero, O == Overflow, and S == Sign? Can we spell out the words? Or even better, if we could come up a more descriptive name of what the method does, that would be great.

Reading https://software.intel.com/sites/landingpage/IntrinsicsGuide/#!=undefined&text=_mm_cmpestr&expand=814,813,814,813, the only difference between S and Z that I can tell is which argument to check for a null character:

Compare packed strings in a and b with lengths la and lb using the control in imm8, and returns 1 if any character in a was null, and 0 otherwise.

vs.

Compare packed strings in a and b with lengths la and lb using the control in imm8, and returns 1 if any character in b was null, and 0 otherwise.

It seems the name should indicate/describe the differences, instead of using SFlag or ZFlag, and then forcing the caller to look up what SFlag means.

I assume C == Carry, Z == Zero, O == Overflow, and S == Sign? Can we spell out the words?

@eerhardt Right, and I am OK to spell them out if it matches the C# naming convention better.

Or even better, if we could come up a more descriptive name of what the method does, that would be great.

I do not think so. "A more descriptive name" may be too long and not clear. For example, NotCAndNotZFlag looks even clearer than "absolute value of lb is larger than or equal to MaxSize and the resulting mask is equal to zero". Meanwhile, we already assume that the users of HW intrinsics have been familar/mastering with the C++ counterparts or ISA design. So, I still suggest to name these intrinsics with flags.

I still think we can come up with better names. It appears there are 5 cases we need to name:

Flag Description Possible Names

S
&
Z Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if any character in a was null, and 0 otherwise.
&
Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if any character in b was null, and 0 otherwise. CompareLeftContainsNull, CompareReturnLeftContainsNull, CompareReturnPartialLeft, CompareGetIsPartialLeft, CompareIsLeftPartial
(Same for Right)

O Compare packed strings with implicit lengths in a and b using the control in imm8, and returns bit 0 of the resulting bit mask. CompareReturnFirstBit, CompareGetFirstBit

C Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if the resulting mask was non-zero, and 0 otherwise. CompareIsResultNonZero, CompareHasNonZeroResult, CompareNonZeroMask, CompareHasResult

A Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if b did not contain a null character and the resulting mask was zero, and 0 otherwise.

A is definitely hard to describe succinctly, since it is the combination of two other ones. So coming up with a good name for it will probably be based on the two underlying names.

Note: The IntrinsicsGuide's description for the explicit length S & Z overloads (_mm_cmpestrs & _mm_cmpestrz) appears to be incorrect.
Compare packed strings in a and b with lengths la and lb using the control in imm8, and returns 1 if any character in b was null, and 0 otherwise.
But then the Operation says:

size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters UpperBound := (128 / size) - 1 dst := (lb <= UpperBound)

Where the Operation appears to match what is in the underlying instruction manuals.

In light of the implicit vs explicit differences, to name S & Z, that's why I like something like Is Partial because it describes at a higher-level what is being checked. ex. File.Exists instead of File.StatReturnsZero.

@eerhardt Thank you so much for the investigation. I read the manual again and I agree that "a more descriptive name" is very necessary because FLAGS is used in a non-standard manner with these SSE4.2 string instructions.
According to the above table you provided, there is a potential name matrix:

original names new names

CompareCFlag CompareNonZeroResultMask

CompareZFlag CompareRightContainsNull

CompareSFlag CompareLeftContainsNull

CompareOFlag CompareReturnFirstResultBit

CompareNotCAndNotZFlag CompareZeroResultMaskAndRightNotContainsNull

Does it look good to you?

cc @CarolEidt @tannergooding

The only issue I see if the difference between implicit and explicit length overloads. If we called the explicit length overloads ContainsNull that would be incorrect. It instead returns if the rightLength is less than the maximum (8 or 16). So we should use a different name for the explicit length overloads.

I'd like to keep them using the same name, if a decent name comes up. But if one isn't available, I'm fine with the implicit and explicit length overloads being named different (RightContainsNull vs. RightLessThanMax/RightIsPartial/etc.)

HasResult/ContainsResult?

I think we could make it more obvious by calling the Polarity flags something like NegateResult instead.

Reading if (ContainsMatch(op1, op2, StringComparisonMode.NegateResult)) seems much better (both readable and understandable) than if (CompareZeroResultMask(op1, op2, StringComparisonMode.NegatiePolarity))

We could do EndOfString, since that applies to both Explicit (where end of string is length determined) and Implicit (where end of string is determined by the null character)?

I think this is a good idea and I also perfer to use the same name for Explicit and Implicit.
How about Left/RightTerminated?

I think we could make it more obvious by calling the Polarity flags something like NegateResult instead.

Good point! We can rename these two polarity flags both:

/// <summary> /// _SIDD_NEGATIVE_POLARITY /// </summary> NegativeResult = 0x10, /// <summary> /// _SIDD_MASKED_NEGATIVE_POLARITY /// </summary> NegativeUsefulResult = 0x30,

With the above renamed StringComparisonMode, we seems to have the new name matrix:

original names new names

CompareCFlag CompareHasMatch

CompareZFlag CompareRightTerminated

CompareSFlag CompareLeftTerminated

CompareOFlag CompareReturnFirstResultBit

CompareNotCAndNotZFlag CompareNoMatchAndRightNotTerminated

tannergooding · 2018-06-11T19:37:11Z

It might be good to split this into two PRs. One fixing the argument types and the other fixing the SSE4.2 API names.

Did we also already have the discussion on whether it was better for mask/control parameters to be the same as the base type or if it was better to be the same-sized integer type? CC. @eerhardt for input on the API side of things.

fiigii · 2018-06-11T20:24:44Z

It might be good to split this into two PRs.

I am not sure. I put them together just for simplifying the CoreFX/CoreCLR synchronization.

tannergooding · 2018-06-11T20:28:58Z

I put them together just for simplifying the CoreFX/CoreCLR synchronization.

Right, but it sounds like the Sse4.2 changes are a bit more complex, and should probably have a bit more "in-depth" review.

I'm fine with keeping them together, if you prefer, but it might be easier to do two separate changes.

fiigii · 2018-06-11T20:38:14Z

but it sounds like the Sse4.2 changes are a bit more complex, and should probably have a bit more "in-depth" review.

@tannergooding Thanks for clarifying. I agree that Sse4.2 changes are a bit more complex and it may impact other API naming. For example, if we decide not using "flags" (e.g., C or Carry, Z or Zero, etc.) as @eerhardt suggested, TestC and TestZ probably also need to be changed to match this convention.

I would split these changes if Sse41 blocks other work.

eerhardt · 2018-06-12T16:04:32Z

src/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/Sse42.PlatformNotSupported.cs


        /// <summary>
        /// __m128i _mm_cmpistrm (__m128i a, __m128i b, const int imm8)
        ///   PCMPISTRM xmm, xmm/m128, imm8
        /// </summary>
-        public static Vector128<ushort> CompareImplicitLengthBitMask(Vector128<sbyte> left, Vector128<sbyte> right, StringComparisonMode mode) { throw new PlatformNotSupportedException(); }
+        public static Vector128<ushort> CompareBitMask(Vector128<sbyte> left, Vector128<sbyte> right, StringComparisonMode mode) { throw new PlatformNotSupportedException(); }


For my information - Why did we break out BitMask vs UnitMask into different overloads? Why not keep it in the StringComparisonMode enum?

Ah, good catch! This is a legacy design from the previous versions, will fix. Thanks!

CarolEidt

For my information - Why did we break out BitMask vs UnitMask into different overloads? Why not keep it in the StringComparisonMode enum?

To me, it seems wrong to have it in the StringComparisonMode enum, since the values overlap.

CarolEidt · 2018-06-12T22:48:14Z

src/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/Enums.cs

+        /// <summary>
+        /// _SIDD_UNIT_MASK
+        /// </summary>
+        UnitMask = 0x40,


How are UnitMask and MostSignificant (or LeastSignificant and BitMask) distinguished? I don't think we should have multiple names for the same value in a single enum - are these used in different contexts? If so, I would think they'd be different enums.

are these used in different contexts?

Yes.
UnitMask and BitMask is only used for CompareMask (PCMPESTRM).
LeastSignificant and MostSignificant is only used for CompareIndex (PCMPESTRI).

If so, I would think they'd be different enums.

So, CompareMask and CompareIndex would have an additional imm parameter, which would complicate the JIT implementation (especially non-const fallback).
How about encoding these two flags into the intrinsic function name?

How about encoding these two flags into the intrinsic function name?

I don't think that's necessary - I just don't see why the enums should be the same.

I think it depends on whether we want to map closely with the underlying hardware instruction, which I thought was a desire.

Section 4.1.5 Output Selection in https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-2b-manual.pdf

Since it will be a Flags enum (after this PR), I don't think this is a huge issue. It follows the underlying HW instruction design, and is understandable - use Least|Most Significant with the CompareIndex method, use Bit|Unit Mask with the CompareMask method.

Rethinking the open a new issue comment, I guess the current code has the Unit/BitMask methods split, so this PR is already changing this API.

I dislike exploding the methods out because:

It increases the number of methods, and method names in intellisense/docs.

It doesn't allow for a "default". I either need to pick - Bit or Unit mask - up front. Using an enum allows me to use the API without making the choice. And then if that choice doesn't work for me, I can change the enum value.

I do like the advantage that 3 enums provides in that Bit/UnitMask and Least/MostSignificant values don't affect the methods that return bool, so those methods don't even get those options. And also that CompareIndex doesn't get Bit/UnitMask option and vice versa. So if we were going to change the current PR, that would be my vote.

Using an enum allows me to use the API without making the choice.

I'm not sure how valid this is. The choice can make a fairly big difference on how you write your algorithm, so you will generally need to make the choice and be aware of how it impacts your code, right out.

(You will also have to specify one of the two enum values either way)

I'm not sure how valid this is.

It's completely valid when I'm exploring the API to learn how it works.

(You will also have to specify one of the two enum values either way)

Only in the 3 enums design. As currently proposed I wouldn't need to specify Bit/UnitMask when calling CompareMask.

As currently proposed I wouldn't need to specify Bit/UnitMask when calling CompareMask.

I'm not convinced that's a good thing 😄

That is, these APIs and instructions are already fairly hard to understand. Making parts of it more obvious, even if it results in us exposing double the methods might be a good thing (It looks like we have ~50 methods exposed right now).

CarolEidt · 2018-06-13T19:38:30Z

At this point, I would propose changing this PR so that it doesn't change the enum names or usage (i.e. only do changes 1 and 3.
There should be an API discussion in corefx before settling in the names and usage of the various result and comparison modes.

fiigii · 2018-06-13T20:25:23Z

At this point, I would propose changing this PR so that it doesn't change the enum names or usage (i.e. only do changes 1 and 3.

Ok, let me separate SSE4.2 changes into a new PR.

fiigii · 2018-06-13T20:36:44Z

src/System.Private.CoreLib/shared/System/Runtime/Intrinsics/X86/Sse41.cs

@@ -182,12 +182,12 @@ public static class Sse41
        /// int _mm_extract_epi8 (__m128i a, const int imm8)
        ///   PEXTRB reg/m8, xmm, imm8
        /// </summary>
-        public static sbyte Extract(Vector128<sbyte> value, byte index) => Extract(value, index);
+        public static int Extract(Vector128<sbyte> value, byte index) => Extract(value, index);


Also fixes SSE2/SSE4.1/AVX Extract return type to reflect the underlying instruction behavior https://github.com/dotnet/coreclr/issues/17957

cc @CarolEidt @tannergooding

CarolEidt

LGTM - thanks @fiigii for your patience and persistence!

tannergooding

LGTM. Couple minor questions.

fiigii · 2018-06-14T19:02:57Z

src/System.Private.CoreLib/shared/System/Runtime/Intrinsics/X86/Avx.cs

-                Store(buffer, value);
-                return buffer[index];   
-            }
+            return Unsafe.Add<byte>(ref Unsafe.As<Vector256<byte>, byte>(ref value), index & 0x1F);


Removed sbyte and short overloads of SSE2/SSE4.1/AVX Extract and simplify Avx.Extract non-const fallback as @jkotas's suggestion.

I would prepare the CoreFX counterpart if this PR looks good to you guys.

fiigii · 2018-06-18T17:38:27Z

@eerhardt Can we merge this PR?

eerhardt · 2018-06-18T20:16:41Z

Oops - I thought this was already merged. Merging now.

* Improve Intel hardware intrinsic APIs * Simplify Avx.Extract non-const fallback Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

* Improve Intel hardware intrinsic APIs * Simplify Avx.Extract non-const fallback Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

* Improve Intel hardware intrinsic APIs * Simplify Avx.Extract non-const fallback Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

* Improve Intel hardware intrinsic APIs * Simplify Avx.Extract non-const fallback Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

This will disable the test from being run with smart in our windows arm testing. This corresponds to the tests being deleted in dotnet#17637.

This will disable the test from being run with smart in our windows arm testing. This corresponds to the tests being deleted in #17637.

* Improve Intel hardware intrinsic APIs * Simplify Avx.Extract non-const fallback Commit migrated from dotnet/coreclr@ea58e86

) This will disable the test from being run with smart in our windows arm testing. This corresponds to the tests being deleted in dotnet/coreclr#17637. Commit migrated from dotnet/coreclr@b55d2b8

dotnet-bot added the 2 - In Progress label Apr 18, 2018

RussKeldorph added the area-CodeGen label Apr 19, 2018

eerhardt reviewed Apr 30, 2018

View reviewed changes

fiigii force-pushed the improveapis branch 2 times, most recently from 02608d9 to 92e6ade Compare May 2, 2018 08:25

fiigii force-pushed the improveapis branch from 92e6ade to 9b73d33 Compare June 11, 2018 18:21

fiigii commented Jun 11, 2018

View reviewed changes

eerhardt reviewed Jun 11, 2018

View reviewed changes

fiigii force-pushed the improveapis branch from 9b73d33 to bc509d5 Compare June 11, 2018 20:23

eerhardt reviewed Jun 12, 2018

View reviewed changes

fiigii force-pushed the improveapis branch from bc509d5 to 762fcfd Compare June 12, 2018 16:44

fiigii mentioned this pull request Jun 12, 2018

Move x86 HW intrinsics files to shared #18427

Merged

fiigii force-pushed the improveapis branch from 762fcfd to 9b3769c Compare June 12, 2018 22:41

CarolEidt reviewed Jun 12, 2018

View reviewed changes

fiigii force-pushed the improveapis branch from 9b3769c to 9f41e05 Compare June 13, 2018 20:32

fiigii commented Jun 13, 2018

View reviewed changes

CarolEidt approved these changes Jun 13, 2018

View reviewed changes

tannergooding approved these changes Jun 14, 2018

View reviewed changes

Improve Intel hardware intrinsic APIs

0157aab

fiigii force-pushed the improveapis branch 2 times, most recently from e7e8f0c to 0157aab Compare June 14, 2018 18:26

Simplify Avx.Extract non-const fallback

5d8c06a

fiigii commented Jun 14, 2018

View reviewed changes

fiigii mentioned this pull request Jun 15, 2018

Improve Intel hardware intrinsic APIs dotnet/corefx#30410

Closed

eerhardt merged commit ea58e86 into dotnet:master Jun 18, 2018

jkotas pushed a commit to dotnet/corert that referenced this pull request Jun 18, 2018

Improve Intel hardware intrinsic APIs (dotnet/coreclr#17637)

1329682

* Improve Intel hardware intrinsic APIs * Simplify Avx.Extract non-const fallback Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

fiigii mentioned this pull request Jun 19, 2018

Fix struct promotion check for SIMD field #18548

Merged

fiigii deleted the improveapis branch June 19, 2018 20:24

CarolEidt mentioned this pull request Jun 20, 2018

An UnusedValue still requires a target reg #18561

Merged

jashook pushed a commit to jashook/coreclr that referenced this pull request Jun 20, 2018

Remove MaskLoad tests from arm and arm64 lst files

d43e96f

This will disable the test from being run with smart in our windows arm testing. This corresponds to the tests being deleted in dotnet#17637.

jashook mentioned this pull request Jun 20, 2018

Remove MaskLoad tests from arm and arm64 lst files #18569

Merged

jashook pushed a commit that referenced this pull request Jun 20, 2018

Remove MaskLoad tests from arm and arm64 lst files (#18569)

b55d2b8

This will disable the test from being run with smart in our windows arm testing. This corresponds to the tests being deleted in #17637.

This was referenced Jun 30, 2018

Some test fixes for the x86 HWIntrinsics #18734

Merged

Fixing up the Sse41.Insert float HWIntrinsics #18735

Merged

fiigii mentioned this pull request Sep 20, 2018

[No Merge]Adding SSE4.2 STTNI intrinsic APIs #19958

Closed

fiigii mentioned this pull request Nov 26, 2018

Adding missing overloads of the Sse2 and Sse41 Extract methods #21197

Closed

lewurm mentioned this pull request Feb 1, 2019

[2018-08] Bump corert mono/mono#12721

Merged

CarolEidt mentioned this pull request Dec 16, 2019

Improve APIs for Intel string handling intrinsics dotnet/runtime#957

Open

fiigii mentioned this pull request Jan 31, 2020

Post-2.1 plan of Intel hardware intrinsic dotnet/runtime#10260

Closed

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Improve Intel hardware intrinsic APIs (dotnet/coreclr#17637)

d163df6

* Improve Intel hardware intrinsic APIs * Simplify Avx.Extract non-const fallback Commit migrated from dotnet/coreclr@ea58e86

Flag	Description	Possible Names
S & Z	Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if any character in a was null, and 0 otherwise. & Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if any character in b was null, and 0 otherwise.	CompareLeftContainsNull, CompareReturnLeftContainsNull, CompareReturnPartialLeft, CompareGetIsPartialLeft, CompareIsLeftPartial (Same for `Right`)
O	Compare packed strings with implicit lengths in a and b using the control in imm8, and returns bit 0 of the resulting bit mask.	CompareReturnFirstBit, CompareGetFirstBit
C	Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if the resulting mask was non-zero, and 0 otherwise.	CompareIsResultNonZero, CompareHasNonZeroResult, CompareNonZeroMask, CompareHasResult
A	Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if b did not contain a null character and the resulting mask was zero, and 0 otherwise.

original names	new names
CompareCFlag	CompareNonZeroResultMask
CompareZFlag	CompareRightContainsNull
CompareSFlag	CompareLeftContainsNull
CompareOFlag	CompareReturnFirstResultBit
CompareNotCAndNotZFlag	CompareZeroResultMaskAndRightNotContainsNull

original names	new names
CompareCFlag	CompareHasMatch
CompareZFlag	CompareRightTerminated
CompareSFlag	CompareLeftTerminated
CompareOFlag	CompareReturnFirstResultBit
CompareNotCAndNotZFlag	CompareNoMatchAndRightNotTerminated

Improve Intel hardware intrinsic APIs #17637

Improve Intel hardware intrinsic APIs #17637

Conversation

fiigii commented Apr 18, 2018 • edited Loading

4creators commented Apr 19, 2018 • edited Loading

fiigii commented Apr 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fiigii commented May 2, 2018

fiigii Jun 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fiigii Jun 11, 2018 • edited Loading

Choose a reason for hiding this comment

fiigii commented Jun 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fiigii Jun 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Jun 11, 2018

fiigii commented Jun 11, 2018

tannergooding commented Jun 11, 2018

fiigii commented Jun 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fiigii Jun 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarolEidt commented Jun 13, 2018

fiigii commented Jun 13, 2018

Choose a reason for hiding this comment

CarolEidt left a comment

Choose a reason for hiding this comment

tannergooding left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fiigii commented Jun 18, 2018

eerhardt commented Jun 18, 2018

fiigii commented Apr 18, 2018 •

edited

Loading

4creators commented Apr 19, 2018 •

edited

Loading

fiigii Jun 11, 2018 •

edited

Loading

fiigii Jun 11, 2018 •

edited

Loading

fiigii Jun 12, 2018 •

edited

Loading

fiigii Jun 12, 2018 •

edited

Loading