Some test fixes for the x86 HWIntrinsics #18734

tannergooding · 2018-06-30T19:39:58Z

This adds back the Avx.MaskLoad tests (temporarily removed due to the API changes made in Improve Intel hardware intrinsic APIs #17637)
This removes unnecessary try/catch blocks from the ExtractScalar and InsertScalar tests
- The only time PNSE should be thrown is in the UnsupportedScenario test, which is the only test run when IsSupported is false. The UnsupportedScenario itself catches the PNSE and sets the test to pass if the exception was caught
This fixes the Avx.Extract methods to throw PNSE when IsSupported is false
- This is the source of the current Avx test failures for COMPlus_FeatureSIMD=0 and COMPlus_EnableAVX=0

tannergooding · 2018-06-30T19:40:12Z

tannergooding · 2018-06-30T19:43:18Z

I am working on fixing up the Sse41.Insert float tests that were removed with due to the API changes made in #17637 as well.

Edit: These are now handled by #18735

…sertScalar HWIntrinsic test templates

…d is false a

tannergooding · 2018-07-01T03:08:09Z

@dotnet-bot test Windows_NT x64 Checked jitincompletehwintrinsic
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Windows_NT x64 Checked jitnox86hwintrinsic

@dotnet-bot test Windows_NT x86 Checked jitincompletehwintrinsic
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Windows_NT x86 Checked jitnox86hwintrinsic

@dotnet-bot test Ubuntu x64 Checked jitincompletehwintrinsic
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Ubuntu x64 Checked jitnox86hwintrinsic

tannergooding · 2018-07-01T03:21:20Z

src/System.Private.CoreLib/shared/System/Runtime/Intrinsics/X86/Avx.cs

@@ -241,6 +241,10 @@ public static class Avx
        /// </summary>
        public static byte Extract(Vector256<byte> value, byte index)
        {
+            if (!IsSupported)


I don't believe we want any methods to work when IsSupported is false, even helpers.

We should also investigate if this is really better than (the below is what all three major native compilers do):

vpextrb ; When index is into the lower 128-bits

or

vextractf128 ; When index is into the upper 128-bits vpextrb

We should also investigate if this is really better than

Actually, RyuJIT also generates the above code as well as native compilers. The managed implementation here is just for the non-constant fallback. So, I believe this is really better than calling into one (lower 128-bits) or two (upper 128-bits) large jump-tables.

Hmmm. I'm not so sure here. I always forget that the jit implementation exists (when I see the managed implementation here) and this is one of the smaller jump tables that we could generate (considering we only need to handle 32 cases max)

I will say that, as a min bar, we should add a comment indicating that this is the software fallback.

However, I think that having this "different" software fallback goes against the principle of ensuring the helper indirect invocations execute the same instructions that the hardware accelerated path uses.

As it is now, we need to ensure that the software behavior is semantically equivalent to the hardware instructions that would have been executed (and it has already had to have been fixed a number of times due to minor issues).

However, I think that having this "different" software fallback goes against the principle of ensuring the helper indirect invocations execute the same instructions that the hardware accelerated path uses.

I agree; I think there's a lot of value in the uniformity of the current non-constant approach.

https://github.com/dotnet/coreclr/issues/18743

Hmm, some native compilers (i.e., clang) also optimize the non-const Extract/Insert to the memory operations. It would be better to get some data when we unify the implementation.

@fiigii, I think the point is that the value of uniformity far outweighs any slight perf increase (if one exists) for the non-constant case (especially considering that the non-constant case already takes a perf hit from the call indirection/etc and is expected to be an edge case anyways).

For the constant case, all three compilers definitely emit the SIMD instructions.

Right, I just meant the non-const situations.

tannergooding · 2018-07-01T03:26:58Z

src/System.Private.CoreLib/shared/System/Runtime/Intrinsics/X86/Avx.cs

@@ -523,6 +539,11 @@ public static Vector256<uint> Insert(Vector256<uint> value, uint data, byte inde
        /// </summary>
        public static Vector256<long> Insert(Vector256<long> value, long data, byte index)
        {
+            if (IntPtr.Size != 8)


We shouldn't have some long/ulong helpers work on 32-bit and others not (disabled here since these were the only 64-bit helpers that were "working").

We should have a discussion on whether we:

Always support 64-bit overloads (when IsSupported is true)

This will require emulation on 32-bit

There is one instruction that I know of (movq) that deals with scalars and works on 32-bit

The vector intrinsics don't directly deal with 64-bit registers so Vector128<long/ulong> tends to work

Sometimes supports 64-bit overloads (depending on the underlying instruction and if it is a helper or not)

If we go this route, we should probably come up with a partitioning scheme for telling users these aren't supported on 32-bit

Never support 64-bit overloads on 32-bit OS

CC. @eerhardt

We shouldn't have some long/ulong helpers work on 32-bit and others not

Agree. I think we should disable these two helpers for 32-bit platforms, native compilers also do it.

So with this change, all long/ulong operations will throw PNSE when running in a 32-bit process?

@eerhardt, not quite.

There are two basic types of operations:

Those that take/return only Vector128<long> or Vector128<ulong>

These are fine in both 32-bit and 64-bit mode, as they operation on 128-bit registers (which exist in both 32-bit and 64-bit modes), on 64-bit memory (Scalar Vector128<T>) , or on 128/256-bit memory (Packed Vector128<T> and Packed Vector256<T>)

Those that take/return long or ulong directly, which falls into two-subcategories

Those that require a 64-bit register and cannot work on 32-bit

Sse2.StoreNonTemporal and Sse2.ConvertToInt64(Vector128<double>) are examples that don't work, as they require at least one 64-bit register

Those that can operate directly on 64-bit memory, and don't require a register

These sometimes work on 32-bit machines

ConvertToInt64(Vector128<long>) is an example that, as it uses a special encoding of the movq instruction

ConvertScalarToVector128Double(Vector128<double>, long) is an example of one that supports taking just a 64-bit memory address, but is still not encodable on a 32-bit machine

It seems to me that we shouldn't do anything tantamount to emulation when executing on a 32-bit machine. That said, I'm wondering if there are suggestions as to how these limitations/exclusions are reflected in the usage model?

Arm32/64 is differentiating by the Arm and Arm.Arm64 namespaces (however, this is also partially due to these being entirely different architectures, rather than hierarchical, like x86/x64).

Exposing a new class would probably work, but would also break the make Sse2 inherit from Sse proposal since we don't have multiple inheritance or interfaces that can declare static methods.

Duplicating all the members under a separate x64 namespace would also work, but that is a lot of duplicated metadata for relatively few methods.

I had also thought of just updating the reference assembly to have a x86/x64 specific version; but we are exposing all architectures from a single package so the cross-architecture story is nicer (and so you don't need #ifdef)

Relying on an analyzer/documentation to help notify users of these incompatibilities seems like it might the easiest way to do this...

https://github.com/dotnet/coreclr/issues/18744

tannergooding · 2018-07-01T03:33:18Z

src/System.Private.CoreLib/shared/System/Runtime/Intrinsics/X86/Avx.cs

+            {
+                throw new PlatformNotSupportedException();
+            }
+


Same as with Extract, we should determine if this is "better" than what all three native compilers do, which is:

vpinsrb ; When index is into the lower 128-bits vinsertf128

or

vextractf128 vpinsrb ; When index is into the upper 128-bits vinsertf128

Same as the comment to Extract, the The managed implementation here is just the non-constant fallback.

fiigii

LGTM, thanks.

eerhardt · 2018-07-02T15:33:04Z

            buffer[index] = data;

I know this isn't part of this change, but do we want to do a range check on index here? It would seem we could get into a bad situation where we are corrupting the stack if someone passed in index > 4.

Refers to: src/System.Private.CoreLib/shared/System/Runtime/Intrinsics/X86/Avx.cs:552 in c3945e5. [](commit_id = c3945e5, deletion_comment = False)

eerhardt · 2018-07-02T15:34:03Z

            buffer[index] = data;

Oh, nevermind. As soon as I hit Enter I read the index &= 0x3 line.

In reply to: 401844633 [](ancestors = 401844633)

Refers to: src/System.Private.CoreLib/shared/System/Runtime/Intrinsics/X86/Avx.cs:552 in c3945e5. [](commit_id = c3945e5, deletion_comment = False)

CarolEidt

LGTM, though I only looked at the test changes in the x86/Shared directory - I assume the rest are all auto-generated.

tannergooding · 2018-07-02T16:58:32Z

Merging to unblock other PRs.

I've logged issues for the two pending discussion points:

tannergooding mentioned this pull request Jun 30, 2018

Fold away Sse.StaticCast and Avx.StaticCast in the importer #18519

Merged

Removing unnecessary try/catch blocks from the ExtractScalar and In…

0008bb2

…sertScalar HWIntrinsic test templates

tannergooding mentioned this pull request Jun 30, 2018

Fixing up the Sse41.Insert float HWIntrinsics #18735

Merged

tannergooding added 2 commits June 30, 2018 20:07

Updating the Avx.Extract/Insert methods to throw PNSE when IsSupporte…

d85c8f6

…d is false a

Adding back the tests for Avx.MaskLoad

c3945e5

tannergooding commented Jul 1, 2018

View reviewed changes

fiigii approved these changes Jul 1, 2018

View reviewed changes

CarolEidt approved these changes Jul 2, 2018

View reviewed changes

tannergooding merged commit e524fbe into dotnet:master Jul 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some test fixes for the x86 HWIntrinsics #18734

Some test fixes for the x86 HWIntrinsics #18734

tannergooding commented Jun 30, 2018 •

edited

Loading

tannergooding commented Jun 30, 2018

tannergooding commented Jun 30, 2018 •

edited

Loading

tannergooding commented Jul 1, 2018

tannergooding Jul 1, 2018 •

edited

Loading

fiigii Jul 1, 2018

tannergooding Jul 1, 2018

tannergooding Jul 1, 2018

CarolEidt Jul 2, 2018

tannergooding Jul 2, 2018

fiigii Jul 2, 2018

tannergooding Jul 2, 2018 •

edited

Loading

fiigii Jul 2, 2018

tannergooding Jul 1, 2018

fiigii Jul 1, 2018

eerhardt Jul 2, 2018

tannergooding Jul 2, 2018

CarolEidt Jul 2, 2018

tannergooding Jul 2, 2018 •

edited

Loading

tannergooding Jul 2, 2018

tannergooding Jul 2, 2018

tannergooding Jul 1, 2018 •

edited

Loading

fiigii Jul 1, 2018

fiigii left a comment

eerhardt commented Jul 2, 2018

eerhardt commented Jul 2, 2018

CarolEidt left a comment

tannergooding commented Jul 2, 2018

Some test fixes for the x86 HWIntrinsics #18734

Some test fixes for the x86 HWIntrinsics #18734

Conversation

tannergooding commented Jun 30, 2018 • edited Loading

tannergooding commented Jun 30, 2018

tannergooding commented Jun 30, 2018 • edited Loading

tannergooding commented Jul 1, 2018

tannergooding Jul 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding Jul 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding Jul 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding Jul 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fiigii left a comment

Choose a reason for hiding this comment

eerhardt commented Jul 2, 2018

eerhardt commented Jul 2, 2018

CarolEidt left a comment

Choose a reason for hiding this comment

tannergooding commented Jul 2, 2018

tannergooding commented Jun 30, 2018 •

edited

Loading

tannergooding commented Jun 30, 2018 •

edited

Loading

tannergooding Jul 1, 2018 •

edited

Loading

tannergooding Jul 2, 2018 •

edited

Loading

tannergooding Jul 2, 2018 •

edited

Loading

tannergooding Jul 1, 2018 •

edited

Loading