Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current state of the SIMD/HWIntrinsic configuration knobs and proposed cleanup #11701

Open
tannergooding opened this issue Dec 21, 2018 · 5 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI design-discussion Ongoing discussion about design without consensus
Milestone

Comments

@tannergooding
Copy link
Member

tannergooding commented Dec 21, 2018

We should finish fleshing out the design around the various CLR Configuration Knobs that allow users to control the various System.Numerics.Vectors and System.Runtime.Intrinsics support.

Prior to HWIntrinsics

This section discusses the state of the world in netcoreapp1.0 through netcoreapp2.0. Desktop had similar, but slightly different behavior that I will attempt to call out where relevant.

COMPlus_EnableSSE3_4 and COMPlus_EnableAVX

The former (COMPlus_EnableSSE3_4) appears to be .NETCore only but defaults to 1 and is used in combination with a CPUID check that the VM does. It controls whether the S.N.Vectors intrinsics generate SSE3 through SSE4.2 instructions. The flag is completely ignored if COMPlus_EnableAVX and the corresponding VM check are 1.

The latter (COMPlus_EnableAVX) is available for both Desktop and .NETCore, it defaults to 1 ans is used in combination with a CPUID check that the VM does. It controls:

  • Whether the S.N.Vectors intrinsics generate AVX/AVX2 instructions
  • The size of Vector<T> (setting it to zero, forces Vector<T> to be sized to 16)
  • Whether the compiler emits VEX encoded instructions

COMPlus_FeatureSIMD

This flag is available for both Desktop and .NETCore, it defaults to 1 and controls various bits of code related to Vector<T>, the S.N.Vectors compiler support, and the TYP_SIMD* support.

Setting this to 0 causes Vector<T> to be sized at 16, none of the S.N.Vectors code to be treated as intrinsic, and prevents the various types from being resolved as TYP_SIMD* (which also generally prevents these types from appearing).

Current State

This section discusses the current state of the world for netcoreapp3.0.

COMPlus_Enable<ISA>

We have a number of Enable<ISA> flags, including: the pre-existing SSE3_4 and AVX flags as well as the SSE, SSE2, SSE3, SSSE3, SSE41, SSE42, AVX2, AES, PCLMULQDQ, POPCNT, FMA, LZCNT, BMI1, and BMI2` flags. All of these are used in combination with a corresponding CPUID check that the VM does.

These flags impact the compiler support for a given ISA and any ISAs that are "descendants" of that ISA (e.g. SSE=0 would also disable SSE2 which would disable any ISAs dependent on SSE2, etc). The flags are currently used primarily for the HWIntrinsics feature as that is the only thing that will cause the various instructions to be generated. In the future, these flags might be applicable more generally depending on other optimizations the JIT could consider. An exception to this is SSE and SSE2 which are considered "baseline support" by RyuJIT. These ISAs only impact the corresponding HWIntrinsic ISAs and do not actually impact compiler support for generating these instructions.

The pre-existing SSE3_4 flag is now treated as equivalent to SSE3. It impacts the SSE3 ISA and any child ISAs (including AVX). It otherwise functions identically and continues impacting the codgen support for S.N.Vectors.

The pre-existing AVX flag continues impacting the size of Vector<T> and whether the compiler emits VEX encoded instructions. However, for the size of Vector<T> it now does so indirectly (in that disabling AVX also disables AVX2), as the check was shifted onto AVX2.

COMPlus_FeatureSIMD and COMPlus_EnableHWIntrinsic

The COMPlus_FeatureSIMD flag continues functioning as it did before.

The COMPlus_EnableHWIntrinsic flag controls whether the System.Runtime.Intrinsic methods are treated as intrinsic, and therefore, whether they throw a PNSE or generate actual code when the given ISA is supported by the CPU/Compiler. There is currently a bug that setting EnableHWIntrinsic=0 will also disable compiler support for all the various ISAs listed above. This also means that it currently impacts the size of Vector<T> and whether the compiler will emit VEX encoded instructions.

Proposal for Cleanup

In this section, I will attempt to describe where we want to be with the various flags.

New Flag: EnableVEX

Currently we control the VEX support for the compiler by checking the EnableAVX flag (and corresponding CPUID check done by the VM). However, there are two ISAs that require the VEX encoding but not for AVX to also be enabled, these are BMI1 and BMI2. While we should never encounter a CPU that has BMI1/BMI2 but that does not also support AVX, AVX requires an additional check that the OS supports saving/restoring the 256-bit YMM registers. This support is not guaranteed and, at least on Windows, can be toggled by the user. Due to this, we need the check to be updated so that the BMI1/BMI2 ISAs (and any future ISAs with similar requirements) can still use the VEX encoding. Additionally, the VEX encoding is generally more efficient (it removes the RMW requirement from most of the instructions and supports unaligned memory addresses) and it may be desirable to still emit the VEX encoded instructions for SSE through SSE42 when the user has set EnableAVX=0.

The proposal is then to expose a new COMPlus_EnableVEX flag that is used to control the VEX encoding. Setting it to 0 would disable any ISA that requires the VEX encoding (AVX, AVX2, FMA, BMI1, and BMI2, as well as any future ISAs). Its default value (1) would allow the compiler to emit the VEX encoding for SSE through SSE42 when the CPU/OS support AVX but when the user has set EnableAVX=0. It would also allow other ISAs not in the AVX hierarchy (BMI1 and BMI2) to be emitted even when the OS does not support the saving/restoring the 256-bit YMM registers.

An alternative would be to not expose a new flag and instead just update the emitter to know that it can use the VEX encoding for the BMI1/BMI2 ISAs. The only difference from the above would be that SSE through SSE42 would not use the VEX encoding when AVX=0 (and when the OS supports saving/restoring the 256-bit YMM registers). This might be a more accurate state since the VEX encoded forms of the SSE through SSE42 instructions are technically part of the AVX instruction set.

New Flag: VectorTSize

Currently we control the sizeof Vector<T> by defaulting it to 16 and changing it to 32 if AVX2 is supported. However, this is not very extensible (what do we do when/if AVX-512 becomes supported and the size can be 64) and it is very much tied to x86 (you wouldn't want this to impact ARM if we add SVE support). It also means that if you need a smaller Vector<T>, you must also disable the general compiler support for the AVX2 ISA (at a minimum). This also impacts the HWIntrinsics feature.

The proposal is then to expose a new COMPlus_VectorTSize flag that is used to control the sizeof Vector<T>. The value would default to 0 which would mean to follow the normal logic we have today (size to 16 by default and change to 32 if AVX2 is supported). We would then come up with an additional scheme such that other values allow the user to explicitly size Vector<T> (to a supported size).

My current thinking is that any unsupported value is treated as 0 (default). Otherwise, the supported values are the exact sizes (16 or 32, in the future 64 if AVX-512 becomes supported, etc). Another option would be that the value is treated as the nearest size that is less than the given size. As an example, if the user gives 31, it would be sized 16. If the user gave 64 and we only support 32 and 16, it would be 32. If the user gave 100 and we support 128, 64, 32, and 16; they would get 64.

The flag would continue being used in conjunction with the Enable<ISA> checks for a given platform, as you can't size Vector<T> to 32 if AVX is not supported (for example).

COMPlus_Enable<ISA>

These flags are currently in a fairly good state, some considerations might be:

  • Should we be exposing SSE and SSE2 or should they be folded back into the EnableHWIntrinsic flag (given that they are considered "baseline" for CoreCLR).
  • Can we get rid of SSE3_4, since this is now covered by the individual SSE3, SSSE3, SSE41, and SSE42 flags and since it is treated as equivalent to SSE3 (which will also disable the child ISAs).

COMPlus_FeatureSIMD and COMPlus_EnableHWIntrinsic

COMPlus_FeatureSIMD should have its scope reduced so that it only impacts the S.N.Vectors codegen. The TYP_SIMD* support should be split out into its own feature that FEATURE_SIMD and FEATURE_HW_INTRINSICS can sit ontop of.

COMPlus_EnableHWIntrinsic should be fixed so that it only impacts the S.R.Intrinsics codegen. It should have no impact on the various ISAs the compiler lists as supported.

category:implementation
theme:vector-codegen
skill-level:intermediate
cost:medium
impact:small

@tannergooding
Copy link
Member Author

CC. @CarolEidt, @fiigii

I tried to capture the previous state of the world, the current state of the world, and what the various discussions we've had look to point to for the future state of the world in relation to the various CLR configuration knobs we have around FEATURE_SIMD and FEATURE_HW_INTRINSICS.

Feel free to let me know if there is anything here that you feel was captured incorrectly or could be clarified.

@tannergooding
Copy link
Member Author

I think we should also determine what of this falls into the 3.0 basket and what falls into post 3.0. I believe the only thing that we should put a hard requirement on fixing is the state of the EnableHWIntrinsic flag (it currently impacts all ISAs the compiler lists as supported).

@tannergooding
Copy link
Member Author

This is related to https://github.com/dotnet/coreclr/issues/19221, which was the initial proposal around EnableVEX; but captures more and gives more context on the various states we've exposed.

@BruceForstall
Copy link
Member

@tannergooding @CarolEidt @fiigii Do you expect changes to be made here for 3.0?

@fiigii
Copy link
Contributor

fiigii commented Mar 14, 2019

IMO, the current knobs are enough for 3.0. And some new features mentioned in this issue (like VectorTSize) may depend on the work (1) decoupling hardware intrinsic from FEATURE_SIMD and (2) implementing S.N.Vector<T> operations in hardware intrinsic. So, I suggest putting the issue to "future".

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@TIHan TIHan removed the JitUntriaged CLR JIT issues needing additional triage label Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI design-discussion Ongoing discussion about design without consensus
Projects
None yet
Development

No branches or pull requests

5 participants