Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the user to control the MaxVectorTBitWidth #85551

Merged
merged 24 commits into from
Jun 5, 2023

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Apr 29, 2023

This will also allow Vector<T> to eventually be larger than 256-bits via explicit opt-in and for the user to explicitly opt-in to a smaller size, if desired, without requiring ISA disablement

To achieve this we've done three primary things:

  1. We now track as part of CG2/NAOT compilation whether a type is Vector<T> or transitively has a Vector<T> field. Such structs are marked as requiring a type layout check on load
  2. We now explicitly track the Vector<T> size as an instruction set flag. Only one Vector<T> size is allowed to be specified at a time and we have a set of asserts validating that is the case
  3. The user can specify the maximum desired size for Vector<T> via an environment variable (JIT) or command line switch (CG2/NAOT). This plays into the selected Vector<T> size

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 29, 2023
@ghost ghost assigned tannergooding Apr 29, 2023
@ghost
Copy link

ghost commented Apr 29, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This resolves #85543 and allows a bit more fine grained control without requiring an ISA to be disabled.

This will also allow Vector<T> to eventually be larger than 256-bits via explicit opt-in.

Author: tannergooding
Assignees: tannergooding
Labels:

area-CodeGen-coreclr

Milestone: -

@tannergooding
Copy link
Member Author

This doesn't yet cover a corresponding switch for NAOT/R2R.

Comment on lines 1974 to 1977
// Some architectures can experience frequency throttling when executing
// executing 512-bit width instructions. To account for this we set the
// default preferred vector width to 256-bits in some scenarios. Users
// can override this with `DOTNET_PreferredVectorWith=512`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM/GCC actually extend this prefer-vector-width=256 behavior all the way up to the latest generation.

However, the general throttling issue has been fixed since Ice Lake and there are only a few instructions with "false dependencies" that can still cause some slowdown if used incorrectly. Since we don't have a general purpose auto-vectorizer and really just use this to control simple memory operations and comparisons, we should be fine limiting it to just the below.

@tannergooding tannergooding added the avx512 Related to the AVX-512 architecture label Apr 29, 2023
@tannergooding tannergooding changed the title All the user to control the MaxVectorTBitWidth and PreferredVectorBitWidth Allow the user to control the MaxVectorTBitWidth and PreferredVectorBitWidth Apr 29, 2023
@tannergooding tannergooding marked this pull request as ready for review April 29, 2023 18:51
@runfoapp runfoapp bot mentioned this pull request May 1, 2023
@anthonycanino
Copy link
Contributor

Thanks for this Tanner.

@BruceForstall
Copy link
Member

@dotnet/jit-contrib

@tannergooding
Copy link
Member Author

Resolved merge conflicts

@sebastienros
Copy link
Member

/benchmark fortunes_ef aspnet-citrine-win runtime

@pr-benchmarks
Copy link

pr-benchmarks bot commented May 3, 2023

Benchmark started for fortunes_ef on aspnet-citrine-win with runtime. Logs: link

@tannergooding
Copy link
Member Author

Ping @dotnet/jit-contrib, @jkotas for review/feedback

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good to me. I had a couple suggestions on namings that (unfortunately) would be somewhat pervasive. You can decide whether to take them or not.

Presumably, where before using arm64 altjit on x64 we would set DOTNET_SIMD16ByteOnly=1, now we would set DOTNET_MaxVectorTBitWidth=128?

src/coreclr/inc/corinfo.h Outdated Show resolved Hide resolved
src/coreclr/inc/corinfo.h Outdated Show resolved Hide resolved
src/coreclr/jit/ee_il_dll.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/codeman.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/codeman.cpp Outdated Show resolved Hide resolved
@jkotas
Copy link
Member

jkotas commented May 6, 2023

Have you given any thought to how this should work with AOT compilers?

  • It would be nice to be able to pass the Vector<T> size to the AOT compiler
  • We need check to validate that the Vector<T> size chosen at AOT time matches the size chosen at runtime

@tannergooding
Copy link
Member Author

Have you given any thought to how this should work with AOT compilers?

  • It would be nice to be able to pass the Vector size to the AOT compiler
  • We need check to validate that the Vector size chosen at AOT time matches the size chosen at runtime

Right. My thought was that much as you can pass --instruction-set avx,avx2,bmi1,bmi2; you should also be able to pass --preferred-vector-bitwidth 256 and --max-vector-t-bitwidth 128 (or similar names).

This would function much as the environment variables do and just help instruct how codegen should occur. It would choose the minimum of the value passed in by the user and the largest actually supported given the target ISAs. So, if you said --preferred-vector-bitwidth 256 but didn't also specify --instruction-set avx2, then you would still end up with 128-bit codegen.

It then functions essentially identically to how the JIT functions in this PR, just in R2R/NAOT instead. What is a bit "unclear" is how exactly Vector<T> would work if the user opted for 128-bit in R2R and t hen ended up with 256-bit at runtime. This is something we could be doing already, but we don't as the handling to throw away just the functions which mismatch doesn't exist today. That may mean we need to differentiate between "user specified" and "system default" as we don't want the entire R2R image being thrown away by default just because R2R targets an SSE2 baseline.

@jkotas
Copy link
Member

jkotas commented May 6, 2023

What is a bit "unclear" is how exactly Vector would work if the user opted for 128-bit in R2R and then ended up with 256-bit at runtime.

It should work same way as other instruction sets specified at AOT time. The system can treat Vector128/256/512 as a "virtual" vector instruction set. It is defined like that in https://github.com/dotnet/runtime/blob/main/src/coreclr/tools/Common/JitInterface/ThunkGenerator/InstructionSetDesc.txt#L92-L94.

his is something we could be doing already, but we don't as the handling to throw away just the functions which mismatch doesn't exist today.

We do have infrastructure to throw away just functions which mismatch (look for needPerMethodInstructionSetFixup). It probably needs work to hook it up to the Vector128/256/512 size configuration.

@tannergooding
Copy link
Member Author

It should work same way as other instruction sets specified at AOT time.

So you're indicating rather than DOTNET_MaxVectorTBitWidth we should instead provide InstructionSet_VectorT128, InstructionSet_VectorT256, and InstructionSet_VectorT512, with the last one being supported but off by default and requiring explicit opt-in, such as via DOTNET_VectorT512=1?

We do have infrastructure to throw away just functions which mismatch (look for needPerMethodInstructionSetFixup). It probably needs work to hook it up to the Vector128/256/512 size configuration.

My understanding is that this doesn't quite work as expected and has a number of larger work items pending. One example the general issue that if you pre-compile for --instruction-set avx2 and you run on hardware without AVX2, then the entire R2R image is thrown away, not just the methods that actually used the VEX encoding.

I expect that trying to plug the vector size information into this same thing may likewise be problematic today.

From #61471 (comment):

The current expectation is that the --instruction-set argument to crossgen2 should make an image where all of the compiled code will be dropped if the application is run on a machine which does not support the specified instruction set. The end result will be significantly degraded startup time on machines without the specific instruction sets. Due to engineering concerns in the current implementation of the JIT/crossgen2 compiler, we're not currently able to enable AVX (or SSE4.2) support selectively on a subset of methods compiled into the application.

@tannergooding tannergooding marked this pull request as draft May 19, 2023 21:08
@tannergooding
Copy link
Member Author

I've converted this to a draft while I work through the last of the issues. I believe I've nearly got everything handled now.

@tannergooding tannergooding force-pushed the prefer-vector-width branch from c733164 to 079e9b0 Compare May 21, 2023 15:49
@tannergooding tannergooding force-pushed the prefer-vector-width branch from fb22192 to 69e496a Compare June 4, 2023 16:38
@tannergooding tannergooding force-pushed the prefer-vector-width branch from 2f9fde4 to b0deccd Compare June 4, 2023 21:37
@tannergooding tannergooding marked this pull request as ready for review June 5, 2023 12:42
@tannergooding
Copy link
Member Author

This should be ready for review again. It's been trimmed down to just the Vector<T> sizing support and we now have (from other recent PRs) relevant tests validating baseline vs avx vs avx2 vs avx512f for JIT, CG2, and NAOT.

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updated checkin comment, and for splitting up this work into the set of PRs you made. It now looks good.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants