Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA1857: hundred or thousands of build warnings from System.Runtime.Intrinsics.X86 on updating from .NET 7.0 to 8.0 #95289

Open
twest820 opened this issue Nov 27, 2023 · 9 comments
Labels
area-System.Numerics needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration question Answer questions and provide assistance, not an issue with source code or documentation.
Milestone

Comments

@twest820
Copy link

Description

Mostly I'm hitting

byte mask = (byte)Avx.MoveMask(Avx.Compare<some flavor>(arg1, arg2));
Avx.Blend(arg3, arg4, mask) // CA1857 in .NET 8.0, no warning in .NET 7.0

which is odd as there's 1) there's no a priori way of knowing the mask value, so mask cannot be declared const, 2) mask cannot be marked [ConstExpected], and 3) mask is a value type consumed directly by blendps and similar instructions, so there doesn't appear to be an optimization for the compiler to take in response to the mask being constant.

There's also at least one similar case trivially addressed at MSIL gen time by loop unrolling (depending on data marshalling requirements for serialization, actual use cases may have 50+ Extract()s, meaning manual loop unrolling is unrealistic, IMO ).

for (byte element = 0; element < 4; ++element) // or < 8 if using 256 bit SIMD
{
    Avx.Extract(arg1, element); // CA1857 even though element's values are well known
}

Configuration

Visual Studio 17.8.1.

Regression?

The Blend() disassembly I've been getting looks fine, so I'm inclined to say yes. Perhaps there are corner cases where poor codegen can occur, though.

Data

This class may be a conveniently simple example for reference.

Analysis

One possibility is [ConstExpected] was added to response to const int imm8 in the intrinsics' C signatures. From what I know, this doesn't appear to be a correct mapping between C/C++ notions of const, C# notions of const, the implications of ConstExpectedAttribute, and how (E)VEX SIMD instructions access ymm (or zmm or mask) registers. So possibly the CA1857s here artifacts of the tool used for #80192.

If it is a correct warning, it's unclear to me how SIMD intrinsic code for x64 processors can be modified in .NET 8.0 to clear it. Googling gives me plenty of hits on SIMD [ConstExpected] being added as an 8.0 feature but nothing on code migration paths. Class-level CA1857 suppressions are therefore tempting. I'd prefer a more graceful approach if one's available, though.

@twest820 twest820 added the tenet-performance Performance related issue label Nov 27, 2023
@vcsjones vcsjones added area-System.Numerics and removed tenet-performance Performance related issue labels Nov 27, 2023
@ghost
Copy link

ghost commented Nov 27, 2023

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Mostly I'm hitting

byte mask = (byte)Avx.MoveMask(Avx.Compare<some flavor>(arg1, arg2));
Avx.Blend(arg3, arg4, mask) // CA1857 in .NET 8.0, no warning in .NET 7.0

which is odd as there's 1) there's no a priori way of knowing the mask value, so mask cannot be declared const, 2) mask cannot be marked [ConstExpected], and 3) mask is a value type consumed directly by blendps and similar instructions, so there doesn't appear to be an optimization for the compiler to take in response to the mask being constant.

There's also at least one similar case trivially addressed at MSIL gen time by loop unrolling (depending on data marshalling requirements for serialization, actual use cases may have 50+ Extract()s, meaning manual loop unrolling is unrealistic, IMO ).

for (byte element = 0; element < 4; ++element) // or < 8 if using 256 bit SIMD
{
    Avx.Extract(arg1, element); // CA1857 even though element's values are well known
}

Configuration

Visual Studio 17.8.1.

Regression?

The Blend() disassembly I've been getting looks fine, so I'm inclined to say yes. Perhaps there are corner cases where poor codegen can occur, though.

Data

This class may be a conveniently simple example for reference.

Analysis

One possibility is [ConstExpected] was added to response to const int imm8 in the intrinsics' C signatures. From what I know, this doesn't appear to be a correct mapping between C/C++ notions of const, C# notions of const, the implications of ConstExpectedAttribute, and how (E)VEX SIMD instructions access ymm (or zmm or mask) registers. So possibly the CA1857s here artifacts of the tool used for #80192.

If it is a correct warning, it's unclear to me how SIMD intrinsic code for x64 processors can be modified in .NET 8.0 to clear it. Googling gives me plenty of hits on SIMD [ConstExpected] being added as an 8.0 feature but nothing on code migration paths. Class-level CA1857 suppressions are therefore tempting. I'd prefer a more graceful approach if one's available, though.

Author: twest820
Assignees: -
Labels:

area-System.Numerics

Milestone: -

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 27, 2023
@tannergooding
Copy link
Member

  1. there's no a priori way of knowing the mask value, so mask cannot be declared const, 2) mask cannot be marked [ConstExpected]

The warning is because you're using Avx.Blend which emits VBLENDPS (or VBLENDPD). The encoding of that is VBLENDPS xmm1, xmm2/m128, imm8 and so the actual hardware requires you to use a constant. The runtime is currently working around this limitation by emitting a call to a fallback function that contains a 256 case switch table encoding each of the possible encodable constants.

  1. mask is a value type consumed directly by blendps and similar instructions

It is not. APIs like Blend require a constant and it is not expected for users to pass in the result of movemask or similar. In some C/C++ compilers, this is straight up unsupported and won't even compile.

There is an alternative API, BlendVariable, which doesn't require a constant. It emit VBLENDVPS (or VBLENDVPD) of which the encoding is VBLENDVPS xmm1, xmm2, xmm3/m128 (or BLENDVPS xmm1, xmm2/m128, <XMM0> on pre AVX hardware).

The expected usage of this is:

Vector128<float> mask = Avx.CompareEqual(arg1, arg2));
Avx.BlendVariable(arg3, arg4, mask);

Doing this should give you a measurable performance increase and behave more as expected.

Class-level CA1857 suppressions are therefore tempting. I'd prefer a more graceful approach if one's available, though.

You should not suppress these. They are warning you of what is functionally "incorrect" code and where you should be using some other API or alternative mechanism in its place.

There are some places where the JIT will try to recognize and optimize particular patterns when the user doesn't pass in a constant, but these are not guaranteed and may not always be supported by other runtimes. There are likewise many places the JIT doesn't try to handle today and where the developer doing the "right thing" themselves is the best option.

@tannergooding tannergooding added question Answer questions and provide assistance, not an issue with source code or documentation. needs-author-action An issue or pull request that requires more info or actions from the author. and removed untriaged New issue has not been triaged by the area owner labels Nov 27, 2023
@ghost
Copy link

ghost commented Nov 27, 2023

This issue has been marked needs-author-action and may be missing some important information.

@tannergooding
Copy link
Member

There are notable approved, but not yet implemented, analyzers to help direct users away from problematic patterns and towards more correct alternatives: #82488 and #82486

Cases like users calling Blend when they should have called BlendVariable would be covered by such an analyzer.

Until the analyzer can be implemented and supported, I'm happy to help direct how different [ConstExpected] warnings should be handled. Either by pointing to an equivalent API or an alternative sequence that will be better performing if there is no "variable" alternative.

@twest820
Copy link
Author

D'oh, thanks Tanner! That all sounds great; sorry I missed the extra v.

Do you have a suggestion for the Extract() case? Since Permute() and Shuffle() are both also [ConstExpected] (and thus routes through Avx.MoveScalar() also trigger CA1857) the best I'm seeing at the moment is approaches like

    static class AvxExtensions
    {
        public static float Extract(Vector128<float> value, byte element)
        {
            return element switch
            {
                0 => Avx.Extract(value, 0),
                1 => Avx.Extract(value, 1),
                2 => Avx.Extract(value, 2),
                3 => Avx.Extract(value, 3),
                _ => throw new ArgumentOutOfRangeException(nameof(element))
            };
        }
    }

This clears the CA1857 but, since it feels like trying to second guess JIT despite the disclaimers above, I'm unsure if it's a better bet codegen-wise than

        public static float Extract(Vector128<float> value, byte element)
        {
            return Avx.Extract(value, element); // suppress CA1857 in suppression file
        }

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Nov 28, 2023
@tannergooding
Copy link
Member

For cases like extract you should probably be using value[index] or value.GetElement(index). Those do not prescribe a particular instruction being used and will do whatever is deemed "most efficient" for the underlying CPU.

In general, using the "cross platform" equivalent API (like x + y instead of Sse.Add(x, y)) is preferred where possible.

@twest820
Copy link
Author

twest820 commented Dec 1, 2023

From a general "write portable code once" standpoint, I agree. Most of the SIMD code I work with goes like class FooAvx256 or Foo.BarAvx256(), though, meaning typically it's not portable. Or, alternatively, a GetElement() call's within CPU dispatched code paths where's probably simpler overall to stay consistent with the non-portable code layout—it's rare I write an inner loop body implementable with a shared ARM-x64 subset.

Another reason I have calls like Avx.Extract() rather than Vector128<T>.[] or Vector128<T>.GetElement() is it's a way of getting VEX instructions into debug and release builds without the overhead of doing publishing and risk of relying on <PublishReadyToRunCrossgen2ExtraArgs>-instruction-set:avx,avx2,fma</PublishReadyToRunCrossgen2ExtraArgs>, which is a build toolchain feature that took years to deliver and still seems undocumented outside of GitHub issues. It's also nice to elide the checking the CLR has to do in order to provide robust [] or GetElement() implementations—not a big deal but, if deployed as a general pattern, leaning on application specific context and higher level Debug.Assert()s does instead yield helpful reduction in runtimes.

I've had a chance in the past couple days to flip several SIMD-bearing repos to .NET 8.0 and am not seeing interestingly different variations from what's already been discussed. I think this could therefore be closed since it looks like all the extract options are awkward somewhere.

@tannergooding
Copy link
Member

-instruction-set:avx,avx2,fma

Notably x86-x64-v3 is probably "better" as it targets the entire set of ISAs required by the spec'd v3 x64 machine.

still seems undocumented outside of GitHub issues

It's a fairly esoteric scenario that largely only impacts startup performance. It's typical that regular R2R performance is fine and any hot functions will be rejitted such that you get stable throughput within the first few seconds regardless.

You have a similar option, IlcInstructionSet, that works for AOT scenarios.

I think this could therefore be closed since it looks like all the extract options are awkward somewhere.

Many hardware instructions can be fairly awkward to use or come with limitations. That's why we expose the xplat APIs to help cover core patterns where the important thing is to do that as fast as possible and not with very specific instructions.

@twest820
Copy link
Author

twest820 commented Dec 4, 2023

It's a fairly esoteric scenario that largely only impacts startup performance.

Mmm, when I've looked at telling crossgen2 not to worry about legacy CPUs there've been meaningful runtime effects. Even with most of the runtime in compute kernels explicitly coded to System.Runtime.Intrinsics.X86.Avx. Nothing huge but worth it you're going to kick off a two week run such. Since PublishReadyToRunCrossgen2ExtraArgs seems to be .NET's only sort of documented equivalent of C++'s /arch:AVX2 IMO it's a mainline build case rather than anything esoteric.

Thanks for mentioning x86-x64-v3. I can't get any search hits on it, though, so it's unclear if it's something which can be used externally at this point. Microsoft also seems to be pushing AOT for ASP.NET, which isn't among our use cases, and it's not an option in VS 17.8.2's publish settings. While IlcInstructionSet has recently gotten low-level GitHub documentation it isn't in the official documentation and also lacks any obvious Visual Studio support. Maybe if one of those gets enough attention to become a feature?

What does work great is your System.Runtime.Intrinsics implementation—thank you for all the time saved in coding C#-C++/CLI-C++ stacks—and then doing our own CPU dispatch. There's some overhead there, sure, but it's been lower cost than trying to coax this aspect of .NET builds into happiness.

@stephentoub stephentoub added this to the Future milestone Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Numerics needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests

4 participants