Allow the user to control the MaxVectorTBitWidth #85551

tannergooding · 2023-04-29T00:30:17Z

This will also allow Vector<T> to eventually be larger than 256-bits via explicit opt-in and for the user to explicitly opt-in to a smaller size, if desired, without requiring ISA disablement

To achieve this we've done three primary things:

We now track as part of CG2/NAOT compilation whether a type is Vector<T> or transitively has a Vector<T> field. Such structs are marked as requiring a type layout check on load
We now explicitly track the Vector<T> size as an instruction set flag. Only one Vector<T> size is allowed to be specified at a time and we have a set of asserts validating that is the case
The user can specify the maximum desired size for Vector<T> via an environment variable (JIT) or command line switch (CG2/NAOT). This plays into the selected Vector<T> size

ghost · 2023-04-29T00:30:30Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This resolves #85543 and allows a bit more fine grained control without requiring an ISA to be disabled.

This will also allow Vector<T> to eventually be larger than 256-bits via explicit opt-in.

Author:	tannergooding
Assignees:	tannergooding
Labels:	`area-CodeGen-coreclr`
Milestone:	-

tannergooding · 2023-04-29T00:33:29Z

This doesn't yet cover a corresponding switch for NAOT/R2R.

tannergooding · 2023-04-29T00:37:51Z

src/coreclr/vm/codeman.cpp

+            // Some architectures can experience frequency throttling when executing
+            // executing 512-bit width instructions. To account for this we set the
+            // default preferred vector width to 256-bits in some scenarios. Users
+            // can override this with `DOTNET_PreferredVectorWith=512`.


LLVM/GCC actually extend this prefer-vector-width=256 behavior all the way up to the latest generation.

However, the general throttling issue has been fixed since Ice Lake and there are only a few instructions with "false dependencies" that can still cause some slowdown if used incorrectly. Since we don't have a general purpose auto-vectorizer and really just use this to control simple memory operations and comparisons, we should be fine limiting it to just the below.

src/coreclr/vm/codeman.cpp

src/coreclr/inc/corjitflags.h

anthonycanino · 2023-05-01T16:17:08Z

Thanks for this Tanner.

src/coreclr/vm/codeman.cpp

BruceForstall · 2023-05-01T17:18:00Z

@dotnet/jit-contrib

tannergooding · 2023-05-03T18:37:02Z

Resolved merge conflicts

sebastienros · 2023-05-03T18:39:48Z

/benchmark fortunes_ef aspnet-citrine-win runtime

pr-benchmarks · 2023-05-03T18:41:01Z

Benchmark started for fortunes_ef on aspnet-citrine-win with runtime. Logs: link

tannergooding · 2023-05-05T14:27:55Z

Ping @dotnet/jit-contrib, @jkotas for review/feedback

BruceForstall

This generally looks good to me. I had a couple suggestions on namings that (unfortunately) would be somewhat pervasive. You can decide whether to take them or not.

Presumably, where before using arm64 altjit on x64 we would set DOTNET_SIMD16ByteOnly=1, now we would set DOTNET_MaxVectorTBitWidth=128?

src/coreclr/inc/corinfo.h

src/coreclr/jit/ee_il_dll.cpp

src/coreclr/vm/codeman.cpp

jkotas · 2023-05-06T01:53:26Z

Have you given any thought to how this should work with AOT compilers?

It would be nice to be able to pass the Vector<T> size to the AOT compiler
We need check to validate that the Vector<T> size chosen at AOT time matches the size chosen at runtime

tannergooding · 2023-05-06T02:17:49Z

Have you given any thought to how this should work with AOT compilers?

It would be nice to be able to pass the Vector size to the AOT compiler

We need check to validate that the Vector size chosen at AOT time matches the size chosen at runtime

Right. My thought was that much as you can pass --instruction-set avx,avx2,bmi1,bmi2; you should also be able to pass --preferred-vector-bitwidth 256 and --max-vector-t-bitwidth 128 (or similar names).

This would function much as the environment variables do and just help instruct how codegen should occur. It would choose the minimum of the value passed in by the user and the largest actually supported given the target ISAs. So, if you said --preferred-vector-bitwidth 256 but didn't also specify --instruction-set avx2, then you would still end up with 128-bit codegen.

It then functions essentially identically to how the JIT functions in this PR, just in R2R/NAOT instead. What is a bit "unclear" is how exactly Vector<T> would work if the user opted for 128-bit in R2R and t hen ended up with 256-bit at runtime. This is something we could be doing already, but we don't as the handling to throw away just the functions which mismatch doesn't exist today. That may mean we need to differentiate between "user specified" and "system default" as we don't want the entire R2R image being thrown away by default just because R2R targets an SSE2 baseline.

jkotas · 2023-05-06T06:52:32Z

What is a bit "unclear" is how exactly Vector would work if the user opted for 128-bit in R2R and then ended up with 256-bit at runtime.

It should work same way as other instruction sets specified at AOT time. The system can treat Vector128/256/512 as a "virtual" vector instruction set. It is defined like that in https://github.com/dotnet/runtime/blob/main/src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt#L92-L94.

his is something we could be doing already, but we don't as the handling to throw away just the functions which mismatch doesn't exist today.

We do have infrastructure to throw away just functions which mismatch (look for needPerMethodInstructionSetFixup). It probably needs work to hook it up to the Vector128/256/512 size configuration.

tannergooding · 2023-05-06T16:38:43Z

It should work same way as other instruction sets specified at AOT time.

So you're indicating rather than DOTNET_MaxVectorTBitWidth we should instead provide InstructionSet_VectorT128, InstructionSet_VectorT256, and InstructionSet_VectorT512, with the last one being supported but off by default and requiring explicit opt-in, such as via DOTNET_VectorT512=1?

We do have infrastructure to throw away just functions which mismatch (look for needPerMethodInstructionSetFixup). It probably needs work to hook it up to the Vector128/256/512 size configuration.

My understanding is that this doesn't quite work as expected and has a number of larger work items pending. One example the general issue that if you pre-compile for --instruction-set avx2 and you run on hardware without AVX2, then the entire R2R image is thrown away, not just the methods that actually used the VEX encoding.

I expect that trying to plug the vector size information into this same thing may likewise be problematic today.

From #61471 (comment):

The current expectation is that the --instruction-set argument to crossgen2 should make an image where all of the compiled code will be dropped if the application is run on a machine which does not support the specified instruction set. The end result will be significantly degraded startup time on machines without the specific instruction sets. Due to engineering concerns in the current implementation of the JIT/crossgen2 compiler, we're not currently able to enable AVX (or SSE4.2) support selectively on a subset of methods compiled into the application.

tannergooding · 2023-05-19T21:09:19Z

I've converted this to a draft while I work through the last of the issues. I believe I've nearly got everything handled now.

…GetJitCpuCapabilityFlags

tannergooding · 2023-06-05T12:43:47Z

This should be ready for review again. It's been trimmed down to just the Vector<T> sizing support and we now have (from other recent PRs) relevant tests validating baseline vs avx vs avx2 vs avx512f for JIT, CG2, and NAOT.

davidwrighton

Thank you for the updated checkin comment, and for splitting up this work into the set of PRs you made. It now looks good.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 29, 2023

ghost assigned tannergooding Apr 29, 2023

tannergooding mentioned this pull request Apr 29, 2023

Regression in JsonPlatform due to AVX-512 changes #85543

Closed

tannergooding commented Apr 29, 2023

View reviewed changes

BruceForstall reviewed Apr 29, 2023

View reviewed changes

src/coreclr/vm/codeman.cpp Outdated Show resolved Hide resolved

tannergooding added the avx512 Related to the AVX-512 architecture label Apr 29, 2023

BruceForstall reviewed Apr 29, 2023

View reviewed changes

src/coreclr/inc/corjitflags.h Outdated Show resolved Hide resolved

build-analysis bot mentioned this pull request Apr 29, 2023

Tracking issue for CI build timeouts #76454

Closed

tannergooding changed the title ~~All the user to control the MaxVectorTBitWidth and PreferredVectorBitWidth~~ Allow the user to control the MaxVectorTBitWidth and PreferredVectorBitWidth Apr 29, 2023

tannergooding marked this pull request as ready for review April 29, 2023 18:51

tannergooding requested a review from MichalStrehovsky as a code owner April 29, 2023 18:51

runfoapp bot mentioned this pull request May 1, 2023

Infra improvements for Helix #68176

Closed

tannergooding commented May 1, 2023

View reviewed changes

src/coreclr/vm/codeman.cpp Outdated Show resolved Hide resolved

EgorBo mentioned this pull request May 2, 2023

Vector{128|256}<T> == Vector{128|256}<T> could emit PXOR + PTEST combo on x86 #85638

Closed

runfoapp bot mentioned this pull request May 3, 2023

Long Running Test: Interop/MonoAPI/MonoMono/PInvokeDetach/PInvokeDetach.sh #73040

Closed

build-analysis bot mentioned this pull request May 3, 2023

System.Net.Quic.Tests.QuicStreamTests.WriteCanceled_NextWriteThrows test failure #76831

Closed

BruceForstall approved these changes May 5, 2023

View reviewed changes

src/coreclr/inc/corinfo.h Outdated Show resolved Hide resolved

src/coreclr/inc/corinfo.h Outdated Show resolved Hide resolved

src/coreclr/jit/ee_il_dll.cpp Outdated Show resolved Hide resolved

src/coreclr/vm/codeman.cpp Outdated Show resolved Hide resolved

jkotas reviewed May 6, 2023

View reviewed changes

src/coreclr/vm/codeman.cpp Outdated Show resolved Hide resolved

tannergooding marked this pull request as draft May 19, 2023 21:08

tannergooding added 3 commits May 20, 2023 10:01

Merge remote-tracking branch 'dotnet/main' into prefer-vector-width

b37c597

Don't allow avxvnni to be "optimistic" since that brings in avx2

9eeefd7

Ensure we handle HWIntrinsics being disabled

079e9b0

tannergooding force-pushed the prefer-vector-width branch from c733164 to 079e9b0 Compare May 21, 2023 15:49

Ensure that the Vector<T> size ISAs are covered by FromInstructionSet

76c33aa

build-analysis bot mentioned this pull request May 21, 2023

Timeout in System.Net.Quic.Functional.Tests #86019

Closed

Merge remote-tracking branch 'dotnet/main' into prefer-vector-width

7e60826

build-analysis bot mentioned this pull request May 23, 2023

Assert failure in GC/API/NoGCRegion/Callback_Svr test #86612

Closed

Merge remote-tracking branch 'dotnet/main' into prefer-vector-width

17e0e01

build-analysis bot mentioned this pull request Jun 2, 2023

Unable to load Analyzer assembly .../Microsoft.CodeAnalysis.Analyzers.dll : Not a valid assembly #85082

Closed

tannergooding added 2 commits June 4, 2023 08:42

Ensure that getMaxVectorByteLength being 0 is handled

3b84fb0

Ensure NAOT startup can correctly check for the VectorT size bits

69e496a

tannergooding force-pushed the prefer-vector-width branch from fb22192 to 69e496a Compare June 4, 2023 16:38

Have BlkOpKindUnroll account for SIMD being disabled

b0deccd

tannergooding force-pushed the prefer-vector-width branch from 2f9fde4 to b0deccd Compare June 4, 2023 21:37

Ensure InstructionSet_VectorT128 is set in the fallback path for PAL_…

b7b26d7

…GetJitCpuCapabilityFlags

tannergooding marked this pull request as ready for review June 5, 2023 12:42

davidwrighton approved these changes Jun 5, 2023

View reviewed changes

tannergooding merged commit af1262c into dotnet:main Jun 5, 2023

tannergooding deleted the prefer-vector-width branch June 5, 2023 21:58

EgorBo mentioned this pull request Jun 7, 2023

[Performance] Startup regression #87235

Closed

This was referenced Jun 8, 2023

Ensure that Vector<T> is tracked as "optimistic" for crossgen2 #87240

Merged

Don't return 0 from getMaxVectorByteLength when intrinsics are disabled #87420

Merged

BruceForstall mentioned this pull request Jun 12, 2023

Failure in checked/release asm diffs pipeline #87432

Closed

kunalspathak mentioned this pull request Jun 20, 2023

Assertion failed '((regArgMaskLive & RBM_FLTARG_REGS) == 0) && "Homing of float argument registers with circular dependencies not implemented."' #87515

Closed

ghost locked as resolved and limited conversation to collaborators Jul 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow the user to control the MaxVectorTBitWidth #85551

Allow the user to control the MaxVectorTBitWidth #85551

tannergooding commented Apr 29, 2023 •

edited

Loading

ghost commented Apr 29, 2023

tannergooding commented Apr 29, 2023

tannergooding Apr 29, 2023

anthonycanino commented May 1, 2023

BruceForstall commented May 1, 2023

tannergooding commented May 3, 2023

sebastienros commented May 3, 2023

pr-benchmarks bot commented May 3, 2023

tannergooding commented May 5, 2023

BruceForstall left a comment

jkotas commented May 6, 2023

tannergooding commented May 6, 2023

jkotas commented May 6, 2023

tannergooding commented May 6, 2023

tannergooding commented May 19, 2023

tannergooding commented Jun 5, 2023

davidwrighton left a comment

Allow the user to control the MaxVectorTBitWidth #85551

Allow the user to control the MaxVectorTBitWidth #85551

Conversation

tannergooding commented Apr 29, 2023 • edited Loading

ghost commented Apr 29, 2023

tannergooding commented Apr 29, 2023

tannergooding Apr 29, 2023

Choose a reason for hiding this comment

anthonycanino commented May 1, 2023

BruceForstall commented May 1, 2023

tannergooding commented May 3, 2023

sebastienros commented May 3, 2023

pr-benchmarks bot commented May 3, 2023

tannergooding commented May 5, 2023

BruceForstall left a comment

Choose a reason for hiding this comment

jkotas commented May 6, 2023

tannergooding commented May 6, 2023

jkotas commented May 6, 2023

tannergooding commented May 6, 2023

tannergooding commented May 19, 2023

tannergooding commented Jun 5, 2023

davidwrighton left a comment

Choose a reason for hiding this comment

tannergooding commented Apr 29, 2023 •

edited

Loading