Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Improve Intel hardware intrinsic APIs #17637

Merged
merged 2 commits into from
Jun 18, 2018
Merged

Conversation

fiigii
Copy link

@fiigii fiigii commented Apr 18, 2018

This PR

  1. fixes https://github.com/dotnet/coreclr/issues/17058 and temporarily disables AVX MaskLoad test cases.
  2. encodes the result flags in the names of certain SSE4.2 string processing intrinsic. That provides more stable runtime behaviors and simplifies JIT implementation, discussed in https://github.com/dotnet/coreclr/issues/16270
  3. Fixes SSE4.1 Insert API on float base type, closes https://github.com/dotnet/coreclr/issues/18143.
  4. Fixes SSE2/SSE4.1/AVX Extract return type to reflect the underlying instruction behavior, closes https://github.com/dotnet/coreclr/issues/17957

@CarolEidt @tannergooding @eerhardt

@4creators
Copy link

4creators commented Apr 19, 2018

The new function names are horrible and in majority of cases do not follow .NET API coding guidelines by using acronyms.

@fiigii
Copy link
Author

fiigii commented Apr 27, 2018

@CarolEidt @tannergooding @eerhardt @AndyAyersMS @mikedn @jkotas
Could you please take a look at this PR and give suggestions? The following intrinsic work depends on the API change.

/// PCMPISTRI xmm, xmm/m128, imm8
/// int _mm_cmpistrs (__m128i a, __m128i b, const int imm8)
/// </summary>
public static bool CompareImplicitLengthNotCAndNotZ(Vector128<sbyte> left, Vector128<sbyte> right, StringComparisonMode mode) => CompareImplicitLengthNotCAndNotZ(left, right, mode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the words ImplicitLength and ExplicitLength in the method names? Isn't the fact that I used the overload without specifying the lengths enough to disambiguate that I wanted to use implicit lengths?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Will try to eliminate these two words.

/// PCMPISTRI xmm, xmm/m128, imm8
/// int _mm_cmpistrz (__m128i a, __m128i b, const int imm8)
/// </summary>
public static bool CompareImplicitLengthNotCAndNotZ(Vector128<byte> left, Vector128<byte> right, StringComparisonMode mode) => CompareImplicitLengthNotCAndNotZ(left, right, mode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does C and Z mean?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does C and Z mean?
What does O mean? Similiarly, what does the S suffix mean below?

They all represent the corresponding flags in the FLAGS register of x86/x64 CPUs.

don't use single letter abbreviations for prefixes/suffixes in names.

I agree, but now we have some similar APIs that "use single letter abbreviations", such as Sse41.TestC, Avx.TestZ, etc.
Do you have any suggestion to improve the names?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume C == Carry, Z == Zero, O == Overflow, and S == Sign? Can we spell out the words? Or even better, if we could come up a more descriptive name of what the method does, that would be great.

Reading https://software.intel.com/sites/landingpage/IntrinsicsGuide/#!=undefined&text=_mm_cmpestr&expand=814,813,814,813, the only difference between S and Z that I can tell is which argument to check for a null character:

Compare packed strings in a and b with lengths la and lb using the control in imm8, and returns 1 if any character in a was null, and 0 otherwise. 

vs.

Compare packed strings in a and b with lengths la and lb using the control in imm8, and returns 1 if any character in b was null, and 0 otherwise. 

It seems the name should indicate/describe the differences, instead of using SFlag or ZFlag, and then forcing the caller to look up what SFlag means.

/// int _mm_cmpistro (__m128i a, __m128i b, const int imm8)
/// PCMPISTRI xmm, xmm/m128, imm8
/// </summary>
public static bool CompareImplicitLengthO(Vector128<byte> left, Vector128<byte> right, StringComparisonMode mode) => CompareImplicitLengthO(left, right, mode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does O mean? Similiarly, what does the S suffix mean below? I agree with @4creators that we typically don't use single letter abbreviations for prefixes/suffixes in names.

@@ -544,243 +544,243 @@ public static class Avx2
/// __m128i _mm_i32gather_epi32 (int const* base_addr, __m128i vindex, const int scale)
/// VPGATHERDD xmm, vm32x, xmm
/// </summary>
public static unsafe Vector128<int> GatherVector128(int* baseAddress, Vector128<int> index, byte scale) => GatherVector128(baseAddress, index, scale);
public static unsafe Vector128<int> GatherVector128WithVector128Int32Indices(int* baseAddress, Vector128<int> index, byte scale) => GatherVector128WithVector128Int32Indices(baseAddress, index, scale);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I can imagine it being easier for JIT implementation, these names are a bit unwieldy from a user's point of view IMO.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but if we only rely on overload to distinguish these intrinsics, the current named-intrinsic framework of RyuJIT has to be changed. Meanwhile, in the future, other platforms (e.g., .NET Native, Mono, etc) are also hard to port hardware intrinsic.

JIT experts may have some thoughts? @CarolEidt @mikedn @AndyAyersMS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JIT intrinsic plumbing needs to be able to deal with overloads. It does it today in several cases. For example, CORINFO_INTRINSIC_Abs can be either Abs(float) or Abs(double). Or there are multiple intrinsic overloads of Vector::.ctor. If the named intrinsics are meant to be the one true way to identify intrinsics in future, it should be just fixed there.

Meanwhile, in the future, other platforms (e.g., .NET Native, Mono, etc) are also hard to port hardware intrinsic.

I do not think so.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current hardware intrinsics distinguish different overloads by "method name + base-type + SIMD size" in the IR. This mechanism works very well on most of the Intel hardware intrinsics, but it is not enough for Avx2.GatherVector*.

If the API shape is more important than the complexity of JIT implementation, I would look into changing the JIT framework or intrinsic IR to address Avx2.GatherVector*.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to me that it will be all that difficult to distinguish these, and given that it is not a common case, we could either choose to simply generate different NI* intrinsic names in the importer, based on the type of the relevant argument, or we could postpone that analysis until codegen, and look at the base type of the SIMD argument to determine which instruction to generate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I would say that, in general, the API shape is more important than the complexity of the JIT implementation, as long as it is both "pay for play" (i.e. the complexity is only incurred for the case in question) and the complexity is not very high.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will retain the current APIs and address in JIT.

@fiigii fiigii force-pushed the improveapis branch 2 times, most recently from 02608d9 to 92e6ade Compare May 2, 2018 08:25
@fiigii
Copy link
Author

fiigii commented May 2, 2018

Re-improved SSE4.1 intrinsic APIs (e.g., CompareImplicitLengthC -> CompareCFlag) based on above suggestions.

@CarolEidt @eerhardt @tannergooding

@@ -283,7 +283,7 @@ public static class Sse41
/// __m128 _mm_insert_ps (__m128 a, __m128 b, const int imm8)
/// INSERTPS xmm, xmm/m32, imm8
/// </summary>
public static Vector128<float> Insert(Vector128<float> value, float data, byte index) => Insert(value, data, index);
public static Vector128<float> Insert(Vector128<float> value, Vector128<float> data, byte index) => Insert(value, data, index);
Copy link
Author

@fiigii fiigii Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected the SSE4.1 Insert API on float base type.

Fix #18143

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have both overloads or will we specially handle the Sse.SetScalarVector128 and corresponding index masks to not zero the upper bits of data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, does the corresponding Avx API need to be updated as well?

Copy link
Author

@fiigii fiigii Jun 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have both overloads

Probably not, SSE4.1 insertps has really special semantics that inserts a value selected from the second vector argument into the first vector. But other SSE4.1 "insert" instructions direct insert a scalar value. https://msdn.microsoft.com/en-us/library/bb514071(v=vs.120).aspx

Also, does the corresponding Avx API need to be updated as well?

No, AVX insert instructions are "normal" :)

@fiigii
Copy link
Author

fiigii commented Jun 11, 2018

Addressed the above feedback and fixed SSE4.1 Insert API (temporarily disabled its tests).
@CarolEidt @tannergooding @eerhardt Could you please take a look when you get a chance? This change is blocking some HW intrinsic work.

/// PCMPISTRI xmm, xmm/m128, imm8
/// int _mm_cmpistrs (__m128i a, __m128i b, const int imm8)
/// </summary>
public static bool CompareNotCAndNotZFlag(Vector128<sbyte> left, Vector128<sbyte> right, StringComparisonMode mode) => CompareNotCAndNotZFlag(left, right, mode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case my original reply gets lost in the outdated section, I've copied it here:

I assume C == Carry, Z == Zero, O == Overflow, and S == Sign? Can we spell out the words? Or even better, if we could come up a more descriptive name of what the method does, that would be great.

Reading https://software.intel.com/sites/landingpage/IntrinsicsGuide/#!=undefined&text=_mm_cmpestr&expand=814,813,814,813, the only difference between S and Z that I can tell is which argument to check for a null character:

Compare packed strings in a and b with lengths la and lb using the control in imm8, and returns 1 if any character in a was null, and 0 otherwise. 

vs.

Compare packed strings in a and b with lengths la and lb using the control in imm8, and returns 1 if any character in b was null, and 0 otherwise. 

It seems the name should indicate/describe the differences, instead of using SFlag or ZFlag, and then forcing the caller to look up what SFlag means.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume C == Carry, Z == Zero, O == Overflow, and S == Sign? Can we spell out the words?

@eerhardt Right, and I am OK to spell them out if it matches the C# naming convention better.

Or even better, if we could come up a more descriptive name of what the method does, that would be great.

I do not think so. "A more descriptive name" may be too long and not clear. For example, NotCAndNotZFlag looks even clearer than "absolute value of lb is larger than or equal to MaxSize and the resulting mask is equal to zero". Meanwhile, we already assume that the users of HW intrinsics have been familar/mastering with the C++ counterparts or ISA design. So, I still suggest to name these intrinsics with flags.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we can come up with better names. It appears there are 5 cases we need to name:

Flag Description Possible Names
S
&
Z
Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if any character in a was null, and 0 otherwise.
&
Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if any character in b was null, and 0 otherwise.
CompareLeftContainsNull, CompareReturnLeftContainsNull, CompareReturnPartialLeft, CompareGetIsPartialLeft, CompareIsLeftPartial
(Same for Right)
O Compare packed strings with implicit lengths in a and b using the control in imm8, and returns bit 0 of the resulting bit mask. CompareReturnFirstBit, CompareGetFirstBit
C Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if the resulting mask was non-zero, and 0 otherwise. CompareIsResultNonZero, CompareHasNonZeroResult, CompareNonZeroMask, CompareHasResult
A Compare packed strings with implicit lengths in a and b using the control in imm8, and returns 1 if b did not contain a null character and the resulting mask was zero, and 0 otherwise.

A is definitely hard to describe succinctly, since it is the combination of two other ones. So coming up with a good name for it will probably be based on the two underlying names.


Note: The IntrinsicsGuide's description for the explicit length S & Z overloads (_mm_cmpestrs & _mm_cmpestrz) appears to be incorrect.
Compare packed strings in a and b with lengths la and lb using the control in imm8, and returns 1 if any character in b was null, and 0 otherwise.
But then the Operation says:

size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1

dst := (lb <= UpperBound)

Where the Operation appears to match what is in the underlying instruction manuals.


In light of the implicit vs explicit differences, to name S & Z, that's why I like something like Is Partial because it describes at a higher-level what is being checked. ex. File.Exists instead of File.StatReturnsZero.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eerhardt Thank you so much for the investigation. I read the manual again and I agree that "a more descriptive name" is very necessary because FLAGS is used in a non-standard manner with these SSE4.2 string instructions.
According to the above table you provided, there is a potential name matrix:

original names new names
CompareCFlag CompareNonZeroResultMask
CompareZFlag CompareRightContainsNull
CompareSFlag CompareLeftContainsNull
CompareOFlag CompareReturnFirstResultBit
CompareNotCAndNotZFlag CompareZeroResultMaskAndRightNotContainsNull

Does it look good to you?

cc @CarolEidt @tannergooding

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only issue I see if the difference between implicit and explicit length overloads. If we called the explicit length overloads ContainsNull that would be incorrect. It instead returns if the rightLength is less than the maximum (8 or 16). So we should use a different name for the explicit length overloads.

I'd like to keep them using the same name, if a decent name comes up. But if one isn't available, I'm fine with the implicit and explicit length overloads being named different (RightContainsNull vs. RightLessThanMax/RightIsPartial/etc.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HasResult/ContainsResult?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could make it more obvious by calling the Polarity flags something like NegateResult instead.

Reading if (ContainsMatch(op1, op2, StringComparisonMode.NegateResult)) seems much better (both readable and understandable) than if (CompareZeroResultMask(op1, op2, StringComparisonMode.NegatiePolarity))

Copy link
Author

@fiigii fiigii Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do EndOfString, since that applies to both Explicit (where end of string is length determined) and Implicit (where end of string is determined by the null character)?

I think this is a good idea and I also perfer to use the same name for Explicit and Implicit.
How about Left/RightTerminated?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could make it more obvious by calling the Polarity flags something like NegateResult instead.

Good point! We can rename these two polarity flags both:

        /// <summary>
        /// _SIDD_NEGATIVE_POLARITY
        /// </summary>
        NegativeResult = 0x10,

        /// <summary>
        /// _SIDD_MASKED_NEGATIVE_POLARITY
        /// </summary>
        NegativeUsefulResult = 0x30,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the above renamed StringComparisonMode, we seems to have the new name matrix:

original names new names
CompareCFlag CompareHasMatch
CompareZFlag CompareRightTerminated
CompareSFlag CompareLeftTerminated
CompareOFlag CompareReturnFirstResultBit
CompareNotCAndNotZFlag CompareNoMatchAndRightNotTerminated

@tannergooding
Copy link
Member

It might be good to split this into two PRs. One fixing the argument types and the other fixing the SSE4.2 API names.

Did we also already have the discussion on whether it was better for mask/control parameters to be the same as the base type or if it was better to be the same-sized integer type? CC. @eerhardt for input on the API side of things.

@fiigii
Copy link
Author

fiigii commented Jun 11, 2018

It might be good to split this into two PRs.

I am not sure. I put them together just for simplifying the CoreFX/CoreCLR synchronization.

@tannergooding
Copy link
Member

I put them together just for simplifying the CoreFX/CoreCLR synchronization.

Right, but it sounds like the Sse4.2 changes are a bit more complex, and should probably have a bit more "in-depth" review.

I'm fine with keeping them together, if you prefer, but it might be easier to do two separate changes.

@fiigii
Copy link
Author

fiigii commented Jun 11, 2018

but it sounds like the Sse4.2 changes are a bit more complex, and should probably have a bit more "in-depth" review.

@tannergooding Thanks for clarifying. I agree that Sse4.2 changes are a bit more complex and it may impact other API naming. For example, if we decide not using "flags" (e.g., C or Carry, Z or Zero, etc.) as @eerhardt suggested, TestC and TestZ probably also need to be changed to match this convention.

I would split these changes if Sse41 blocks other work.


/// <summary>
/// __m128i _mm_cmpistrm (__m128i a, __m128i b, const int imm8)
/// PCMPISTRM xmm, xmm/m128, imm8
/// </summary>
public static Vector128<ushort> CompareImplicitLengthBitMask(Vector128<sbyte> left, Vector128<sbyte> right, StringComparisonMode mode) { throw new PlatformNotSupportedException(); }
public static Vector128<ushort> CompareBitMask(Vector128<sbyte> left, Vector128<sbyte> right, StringComparisonMode mode) { throw new PlatformNotSupportedException(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my information - Why did we break out BitMask vs UnitMask into different overloads? Why not keep it in the StringComparisonMode enum?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch! This is a legacy design from the previous versions, will fix. Thanks!

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my information - Why did we break out BitMask vs UnitMask into different overloads? Why not keep it in the StringComparisonMode enum?

To me, it seems wrong to have it in the StringComparisonMode enum, since the values overlap.

/// <summary>
/// _SIDD_UNIT_MASK
/// </summary>
UnitMask = 0x40,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are UnitMask and MostSignificant (or LeastSignificant and BitMask) distinguished? I don't think we should have multiple names for the same value in a single enum - are these used in different contexts? If so, I would think they'd be different enums.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these used in different contexts?

Yes.
UnitMask and BitMask is only used for CompareMask (PCMPESTRM).
LeastSignificant and MostSignificant is only used for CompareIndex (PCMPESTRI).

Copy link
Author

@fiigii fiigii Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, I would think they'd be different enums.

So, CompareMask and CompareIndex would have an additional imm parameter, which would complicate the JIT implementation (especially non-const fallback).
How about encoding these two flags into the intrinsic function name?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about encoding these two flags into the intrinsic function name?

I don't think that's necessary - I just don't see why the enums should be the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it depends on whether we want to map closely with the underlying hardware instruction, which I thought was a desire.

Section 4.1.5 Output Selection in https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-2b-manual.pdf

image

Since it will be a Flags enum (after this PR), I don't think this is a huge issue. It follows the underlying HW instruction design, and is understandable - use Least|Most Significant with the CompareIndex method, use Bit|Unit Mask with the CompareMask method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rethinking the open a new issue comment, I guess the current code has the Unit/BitMask methods split, so this PR is already changing this API.

I dislike exploding the methods out because:

  1. It increases the number of methods, and method names in intellisense/docs.
  2. It doesn't allow for a "default". I either need to pick - Bit or Unit mask - up front. Using an enum allows me to use the API without making the choice. And then if that choice doesn't work for me, I can change the enum value.

I do like the advantage that 3 enums provides in that Bit/UnitMask and Least/MostSignificant values don't affect the methods that return bool, so those methods don't even get those options. And also that CompareIndex doesn't get Bit/UnitMask option and vice versa. So if we were going to change the current PR, that would be my vote.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an enum allows me to use the API without making the choice.

I'm not sure how valid this is. The choice can make a fairly big difference on how you write your algorithm, so you will generally need to make the choice and be aware of how it impacts your code, right out.

(You will also have to specify one of the two enum values either way)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how valid this is.

It's completely valid when I'm exploring the API to learn how it works.

(You will also have to specify one of the two enum values either way)

Only in the 3 enums design. As currently proposed I wouldn't need to specify Bit/UnitMask when calling CompareMask.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As currently proposed I wouldn't need to specify Bit/UnitMask when calling CompareMask.

I'm not convinced that's a good thing 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, these APIs and instructions are already fairly hard to understand. Making parts of it more obvious, even if it results in us exposing double the methods might be a good thing (It looks like we have ~50 methods exposed right now).

@CarolEidt
Copy link

At this point, I would propose changing this PR so that it doesn't change the enum names or usage (i.e. only do changes 1 and 3.
There should be an API discussion in corefx before settling in the names and usage of the various result and comparison modes.

@fiigii
Copy link
Author

fiigii commented Jun 13, 2018

At this point, I would propose changing this PR so that it doesn't change the enum names or usage (i.e. only do changes 1 and 3.

Ok, let me separate SSE4.2 changes into a new PR.

@@ -182,12 +182,12 @@ public static class Sse41
/// int _mm_extract_epi8 (__m128i a, const int imm8)
/// PEXTRB reg/m8, xmm, imm8
/// </summary>
public static sbyte Extract(Vector128<sbyte> value, byte index) => Extract(value, index);
public static int Extract(Vector128<sbyte> value, byte index) => Extract(value, index);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixes SSE2/SSE4.1/AVX Extract return type to reflect the underlying instruction behavior https://github.com/dotnet/coreclr/issues/17957

cc @CarolEidt @tannergooding

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks @fiigii for your patience and persistence!

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Couple minor questions.

@fiigii fiigii force-pushed the improveapis branch 2 times, most recently from e7e8f0c to 0157aab Compare June 14, 2018 18:26
Store(buffer, value);
return buffer[index];
}
return Unsafe.Add<byte>(ref Unsafe.As<Vector256<byte>, byte>(ref value), index & 0x1F);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed sbyte and short overloads of SSE2/SSE4.1/AVX Extract and simplify Avx.Extract non-const fallback as @jkotas's suggestion.

I would prepare the CoreFX counterpart if this PR looks good to you guys.

@fiigii
Copy link
Author

fiigii commented Jun 18, 2018

@eerhardt Can we merge this PR?

@eerhardt
Copy link
Member

Oops - I thought this was already merged. Merging now.

@eerhardt eerhardt merged commit ea58e86 into dotnet:master Jun 18, 2018
dotnet-bot pushed a commit to dotnet/corefx that referenced this pull request Jun 18, 2018
* Improve Intel hardware intrinsic APIs

* Simplify Avx.Extract non-const fallback

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
dotnet-bot pushed a commit to dotnet/corert that referenced this pull request Jun 18, 2018
* Improve Intel hardware intrinsic APIs

* Simplify Avx.Extract non-const fallback

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
jkotas pushed a commit to dotnet/corefx that referenced this pull request Jun 18, 2018
* Improve Intel hardware intrinsic APIs

* Simplify Avx.Extract non-const fallback

Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
jkotas pushed a commit to dotnet/corert that referenced this pull request Jun 18, 2018
* Improve Intel hardware intrinsic APIs

* Simplify Avx.Extract non-const fallback

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
@fiigii fiigii deleted the improveapis branch June 19, 2018 20:24
jashook pushed a commit to jashook/coreclr that referenced this pull request Jun 20, 2018
This will disable the test from being run with smart in our windows arm
testing. This corresponds to the tests being deleted in dotnet#17637.
jashook pushed a commit that referenced this pull request Jun 20, 2018
This will disable the test from being run with smart in our windows arm
testing. This corresponds to the tests being deleted in #17637.
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Improve Intel hardware intrinsic APIs

* Simplify Avx.Extract non-const fallback


Commit migrated from dotnet/coreclr@ea58e86
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
)

This will disable the test from being run with smart in our windows arm
testing. This corresponds to the tests being deleted in dotnet/coreclr#17637.

Commit migrated from dotnet/coreclr@b55d2b8
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
10 participants