Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Implement more AVX/AVX2 intrinsics #16955

Merged
merged 2 commits into from
Mar 21, 2018
Merged

Implement more AVX/AVX2 intrinsics #16955

merged 2 commits into from
Mar 21, 2018

Conversation

fiigii
Copy link

@fiigii fiigii commented Mar 15, 2018

This PR implements
AVX

  • BroadcastVector128ToVector256
  • MoveMask
  • MaskLoad
  • Permute2x128
  • PermuteVar

AVX2

  • BroadcastVector128ToVector256
  • ConvertToVector256Int16
  • ConvertToVector256UInt16
  • ConvertToVector256Int32
  • ConvertToVector256UInt32
  • ConvertToVector256Int64
  • ConvertToVector256UInt64
  • Max
  • Min
  • MoveMask
  • Permute2x128

But Avx.ZeroUpper and Avx.ZeroAll cannot be tested in C# code because variables of Vector128/256<T> may be saved/restored in/from the stack.

@CarolEidt @tannergooding PTAL

@fiigii
Copy link
Author

fiigii commented Mar 15, 2018

test Windows_NT x64 Checked jitincompletehwintrinsic
test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
test Windows_NT x64 Checked jitx86hwintrinsicnosimd
test Windows_NT x64 Checked jitnox86hwintrinsic

test Windows_NT x86 Checked jitincompletehwintrinsic
test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
test Windows_NT x86 Checked jitx86hwintrinsicnosimd
test Windows_NT x86 Checked jitnox86hwintrinsic

test Ubuntu x64 Checked jitincompletehwintrinsic
test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
test Ubuntu x64 Checked jitx86hwintrinsicnosimd
test Ubuntu x64 Checked jitnox86hwintrinsic

@fiigii
Copy link
Author

fiigii commented Mar 15, 2018

@CarolEidt @tannergooding Does this look good to you?

@CarolEidt
Copy link

But Avx.ZeroUpper and Avx.ZeroAll cannot be tested in C# code because variables of Vector128/256 may be saved/restored in/from the stack.

I'm not sure what you're saying - are we not preserving the values?

@fiigii
Copy link
Author

fiigii commented Mar 16, 2018

Avx.ZeroUpper and Avx.ZeroAll clears the YMM registers values, but as I detected that sometimes (e.g., debug build of C# programs) vector variables could be saved on the stack, so that we cannot test these two intrinsics like:

Vector256<int> v = Avx.LoadVector128(addr1);
// at this point, `v` can be saved on the stack
Avx.ZeroAll();
// at this point, `v` can be restored from the stack
Avx.Store(addr2, v);

// check the values in `addr2` aginst `0`

@CarolEidt
Copy link

Avx.ZeroUpper and Avx.ZeroAll clears the YMM registers values

Right, I get it now. It makes me wonder whether these are inappropriate intrinsics to offer to developers. I suppose we could interpret the semantic to mean that all local variables of vector types must be zero'd (or their upper half zero'd), but I would say that is neither desirable nor worth the effort to implement, and if that's not the semantic then I don't see how a developer could use it. What are you thoughts? @eerhardt @tannergooding @fiigii

@tannergooding
Copy link
Member

Avx.ZeroUpper and Avx.ZeroAll have a few different implications and probably require more "ingrained" JIT support and knowledge

Avx.ZeroUpper is used to prevent SSE<->AVX transition penalties (for when switching between the legacy and VEX encodings). The JIT already tries to emit this in "logical" places itself

Avx.ZeroAll zeros every XMM/YMM register. I imagine this has implications in the register allocator and in the callee/caller register saving logic

@CarolEidt, I would think we should pull these for now and investigate them later, when we can add proper support

@fiigii
Copy link
Author

fiigii commented Mar 16, 2018

I would like to retain them now. vzeroupper and vzeroall are both used to avoid AVX/SSE transition penalties. The current JIT compiler can automatically insert vzeroupper for current hardware features.
However,

  1. other CIL platforms (e.g., .NET Native, Mono) may support SIMD hardware intrinsics in the future, and the zeroupper-insertion feature may not be guaranteed.
  2. the future hardware products may change the conditions of triggering AVX/SSE transition penalties, so the current zeroupper-insertion may be not enough.

So, it would be better to give users the chance...

@tannergooding
Copy link
Member

@fiigii, I agree that they would be useful eventually, but I think they can be excluded from the initial preview.

I think at a minimum, they will require more work for the register allocator (perhaps @CarolEidt can confirm).

@CarolEidt
Copy link

I think it's best to pull them for now, as I think it will take some time to ensure that we 1) decide what the right behavior should be, and 2) ensure that behavior is reasonable and is implemented correctly in the JIT.

@4creators
Copy link

I would like to retain them now. vzeroupper and vzeroall are both used to avoid AVX/SSE transition penalties.

IMO it is very important to have them available, however, delaying their availability as @CarolEidt suggested will not harm experimental release. However, IMO good docs explaining to developers where and how this penalties will arise should accompany experimental release.

@tannergooding
Copy link
Member

IMO good docs explaining to developers where and how this penalties will arise should accompany experimental release.

I don't think this is a priority. This really only arises when you are mixing code that uses the legacy encoding and code that uses the VEX encoding.

The JIT itself is exclusively emitting the VEX-encoding, when supported, so this should really only arise during interop calls.

@fiigii
Copy link
Author

fiigii commented Mar 16, 2018

@CarolEidt @tannergooding Thanks for the suggestions, I will remove the implementation of zeroupper and zeroall for 2.1.

@fiigii fiigii force-pushed the moreavx branch 2 times, most recently from 0364e36 to 8f0ccdb Compare March 16, 2018 18:08
@fiigii
Copy link
Author

fiigii commented Mar 16, 2018

test Windows_NT x64 Checked jitincompletehwintrinsic
test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
test Windows_NT x64 Checked jitx86hwintrinsicnosimd
test Windows_NT x64 Checked jitnox86hwintrinsic

test Windows_NT x86 Checked jitincompletehwintrinsic
test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
test Windows_NT x86 Checked jitx86hwintrinsicnosimd
test Windows_NT x86 Checked jitnox86hwintrinsic

test Ubuntu x64 Checked jitincompletehwintrinsic
test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
test Ubuntu x64 Checked jitx86hwintrinsicnosimd
test Ubuntu x64 Checked jitnox86hwintrinsic

@fiigii
Copy link
Author

fiigii commented Mar 17, 2018

Test failures are caused by https://github.com/dotnet/coreclr/issues/17008, not related to this change.

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fiigii
Copy link
Author

fiigii commented Mar 19, 2018

Will update the test cases to match #16957

@fiigii
Copy link
Author

fiigii commented Mar 20, 2018

This PR and #17030 implement all the AVX intrinsics, except

  1. ZeroAll and ZeroUpper because of the test issue that discussed above.
  2. MaskStore because of the incorrect API design https://github.com/dotnet/coreclr/issues/17058

@fiigii
Copy link
Author

fiigii commented Mar 20, 2018

test Windows_NT x64 Checked jitincompletehwintrinsic
test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
test Windows_NT x64 Checked jitx86hwintrinsicnosimd
test Windows_NT x64 Checked jitnox86hwintrinsic

test Windows_NT x86 Checked jitincompletehwintrinsic
test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
test Windows_NT x86 Checked jitx86hwintrinsicnosimd
test Windows_NT x86 Checked jitnox86hwintrinsic

test Ubuntu x64 Checked jitincompletehwintrinsic
test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
test Ubuntu x64 Checked jitx86hwintrinsicnosimd
test Ubuntu x64 Checked jitnox86hwintrinsic

@fiigii
Copy link
Author

fiigii commented Mar 20, 2018

noavx2 test failure is not related. Can we merge this PR?

@CarolEidt
Copy link

noavx2 test failure is not related

How did you determine this? Is there a bug open for this? That leg has passed relatively recently.

@fiigii
Copy link
Author

fiigii commented Mar 20, 2018

test Windows_NT x64 Checked jitx86hwintrinsicnoavx2

@fiigii
Copy link
Author

fiigii commented Mar 20, 2018

How did you determine this?

I did not change emitter or other general stuff in this PR, and Dev11_dev11_20929 passed on my local machine with EnableAVX2=0.
Let's test again.

@fiigii
Copy link
Author

fiigii commented Mar 20, 2018

test Tizen armel Cross Checked Innerloop Build and Test

@fiigii
Copy link
Author

fiigii commented Mar 20, 2018

Okay, now all the tests passed.

@CarolEidt CarolEidt merged commit 0ebc097 into dotnet:master Mar 21, 2018
@fiigii fiigii deleted the moreavx branch March 21, 2018 00:02
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants