Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Analyzer] Suggest more optimal SIMD patterns where applicable #82488

Open
tannergooding opened this issue Feb 22, 2023 · 6 comments
Open

[Analyzer] Suggest more optimal SIMD patterns where applicable #82488

tannergooding opened this issue Feb 22, 2023 · 6 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer
Milestone

Comments

@tannergooding
Copy link
Member

tannergooding commented Feb 22, 2023

Writing SIMD code can be complex and the "optimal" pattern can vary per platform/architecture. This can get more complicated when newer ISAs are also available for use.

As such, we should provide a code fixer that identifies such patterns and suggests simple transformations to alternative methods that are likely to be better performing.

Category: Performance
Severity = suggestion

Example

A simple example is given below. The exact scenarios supported/recognized will likely grow over time based on input from SIMD experts:

- return value * 2;
+ return value + value;

-or-

- return Sse.Shuffle(value, value, 0b11_11_10_10);
+ return Sse.UnpackHigh(value, value);

Some of these are patterns that the JIT could recognize and optimize implicitly, but that can be equally complex. Recognizing and suggesting the fixes in C# is often simpler/faster.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 22, 2023
@tannergooding tannergooding added area-System.Runtime.Intrinsics code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer api-ready-for-review API is ready for review, it is NOT ready for implementation and removed untriaged New issue has not been triaged by the area owner labels Feb 22, 2023
@ghost
Copy link

ghost commented Feb 22, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Writing SIMD code can be complex and the "optimal" pattern can vary per platform/architecture. This can get more complicated when newer ISAs are also available for use.

As such, we should provide a code fixer that identifies such patterns and suggests simple transformations to alternative methods that are likely to be better performing.

Category: Performance
Severity = suggestion

Example

A simple example is given below. The exact scenarios supported/recognized will likely grow over time based on input from SIMD experts:

- return value * 2;
+ return value + value;

-or-

- return Sse.Shuffle(left, right, 0b11_11_10_10);
+ return Sse.UnpackHigh(left, right);

Some of these are patterns that the JIT could recognize and optimize implicitly, but that can be equally complex. Recognizing and suggesting the fixes in C# is often simpler/faster.

Author: tannergooding
Assignees: -
Labels:

area-System.Runtime.Intrinsics, code-analyzer, code-fixer, api-ready-for-review

Milestone: -

@terrajobst
Copy link
Member

terrajobst commented Feb 23, 2023

Video

Category: Performance

For code that replaces calls into intrinsics this seems fine, such as:

- Sse.Shuffle(value, value, 0b11_11_10_10);
+ Sse.UnpackHigh(value, value);

For more general code that doesn't involve calls to intrinsics we should consider recognizing those in the JIT instead, such as:

- value * 2;
+ value + value;

@JimBobSquarePants
Copy link

Analyzers like this really cannot come soon enough. Even a cheat sheet would be useful just now.

@huoyaoyuan
Copy link
Member

The recommended patterns is likely to grow in the future.
Should these patterns use a same diagnostic ID, or separated ones? Does them require a category or range then?
Is there any available cheat sheet, or should us brainstorm for it?

@tannergooding
Copy link
Member Author

Should these patterns use a same diagnostic ID, or separated ones?

I think it really depends on the operation in question and how likely it is for users to want to separate a given pattern out from the rest.

Is there any available cheat sheet, or should us brainstorm for it?

I do think #82486 is the more prominent/important one to handle first, while its targeted around maintainability/portability, it also has a big impact on JIT codegen and factors into perf as well.

The analyzer represented by this issue (82488) is then more about recommending alternative patterns that the underlying runtime may not support yet or which may be available on newer ISAs that the user hasn't considered yet that should be smaller or more performant. In some cases, the two analyzers may overlap; for example Sse.Shuffle(value, value, 0b11_11_10_10) could become Sse.UnpackHigh(value, value) or could become Vector128.Shuffle(value, Vector128.Create(2, 2, 3, 3)). The underlying runtime may support many of these cases itself, especially when using the xplat APIs, but given there is RyuJIT, NAOT, MonoJIT, MonoInterpreter, MonoLLVM, and potential future platforms like Unity; it will be impossible to definitively know where the SIMD support exists and what optimizations are available. Given that SIMD code is often "perf critical" giving users recommendations to be more explicit themselves can often be beneficial.

It will ultimately come down to some level of brainstorming, but there's plenty of patterns to be handled many of which are called out in the respective architecture optimization manuals or which other compilers like MSVC/Clang or even RyuJIT itself is already handling. Many of them may even be simplifications that are recommended so that we can keep the amount of runtime pattern recognition simpler. If the analyzer exists to tell users to call Vector128.RotateLeft over doing (x << y) | (x >>> (32 - y)) then the JIT doesn't need to spend cycles handling that pattern itself and can spend its time handling more meaningful cases instead.

@tannergooding
Copy link
Member Author

#103557 is an example of a place where this analyzer applies. Simply in removing redundant AsT() casts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer
Projects
None yet
Development

No branches or pull requests

5 participants