Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect r2r codegen for hardware instrinsics #33090

Merged

Conversation

davidwrighton
Copy link
Member

Use of Vector dot product and Vector3/4 constructors may generate code that is not compatible with an SSE2 capable CPU.

This fix adjusts the set of rules for filtering the intrinsics when compiling System.Private.CoreLib, and makes the behavior in crossgen2 match that of crossgen1.

Fixes #32175

@@ -1554,6 +1554,37 @@ private uint FilterNamedIntrinsicMethodAttribs(uint attribs, MethodDesc method)
fTreatAsRegularMethodCall = (methodName == "Round");
}
}
else if (namespaceName == "System.Numerics")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be more of a question for Tanner, but would it make sense to introduce some mechanism on JitInterface to express this? It already felt a bit ugly when this was just Round and it seems like we missed another place - feels like this could be a problem again in the future when RyuJIT finds a new place to use these ISA extensions.

What if instead of the EE side specifying what ISA extensions are allowed before compiling a method, RyuJIT would ask EE side whenever it would like to use it instead and include a reason (HW intrinsic class normally guarded by an IsSupported check, or an opportunistic optimization of an existing API)?

This kind of contract would potentially make it easier to introduce multi-versioning later (right now the crossgen2 driver side has no way of knowing whether multiple versions even make sense).

Not sure if it's a good idea - just trying to find a way to avoid this fragility because I don't think anyone is really happy about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue only exists today for S.P.Corelib, because we have methods which we might optimize internally and therefore cannot easily express the different code paths to the JIT (functionally it is an IsSupported check, but not one codified in the IR trees, which I believe is the root problem).

All of these checks are under Compiler::compSupports(ISA) (HWIntrinsics and other opts) or Compiler::getSIMDSupportLevel() (SIMDIntrinsics) calls, so it might be possible to tweak that code path to just return false for any non-baseline code path if we are in R2R or Crossgen mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to think about a more complex instruction compatibility story, and in fact notification from RyuJit across the jit interface when a more interesting instruction is used, but I want to reserve that sort of change for future work. This is fixing a targetted problem. Once this is fixed, I can go after a more ambitious work experiment in that direction. As part of the instruction-set feature for crossgen2 I intend to build support for noting which functions are taking advantage of what instruction sets, and possibly enable/disable R2R function use on a per method basis, but that's going to require a fair bit more research.

Copy link
Member

@tannergooding tannergooding Mar 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, since this should be everwhere we need to consider...

All usages of getSIMDSupportLevel:

src/coreclr/src/jit/compiler.h:7648:    SIMDLevel getSIMDSupportLevel()
src/coreclr/src/jit/compiler.h:8065:        if (getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/compiler.h:8071:            assert(getSIMDSupportLevel() >= SIMD_SSE2_Supported);
src/coreclr/src/jit/compiler.h:8104:        if (getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/compiler.h:8110:            assert(getSIMDSupportLevel() >= SIMD_SSE2_Supported);
src/coreclr/src/jit/compiler.h:8137:            assert(getSIMDSupportLevel() >= SIMD_SSE2_Supported);
src/coreclr/src/jit/lowerxarch.cpp:2479:            else if ((comp->getSIMDSupportLevel() == SIMD_AVX2_Supported) &&
src/coreclr/src/jit/lowerxarch.cpp:2503:            if ((comp->getSIMDSupportLevel() >= SIMD_SSE4_Supported) && op2->IsIntegralConstVector(0))
src/coreclr/src/jit/lsraxarch.cpp:1970:            assert(compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported);
src/coreclr/src/jit/lsraxarch.cpp:1989:                compiler->getSIMDSupportLevel() == SIMD_SSE2_Supported)
src/coreclr/src/jit/lsraxarch.cpp:2049:                if ((compiler->getSIMDSupportLevel() == SIMD_SSE2_Supported) ||
src/coreclr/src/jit/lsraxarch.cpp:2059:                assert(simdTree->gtSIMDBaseType == TYP_INT && compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported);
src/coreclr/src/jit/lsraxarch.cpp:2065:                if (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/lsraxarch.cpp:2102:                        (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported))
src/coreclr/src/jit/lsraxarch.cpp:2155:            if (compiler->getSIMDSupportLevel() == SIMD_SSE2_Supported)
src/coreclr/src/jit/lsraxarch.cpp:2194:            if (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/lsraxarch.cpp:2214:                if ((compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported) || (simdTree->gtSIMDBaseType == TYP_ULONG))
src/coreclr/src/jit/lsraxarch.cpp:2226:            if ((compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported) && (simdTree->gtSIMDBaseType != TYP_DOUBLE))
src/coreclr/src/jit/simd.cpp:1431:    if ((getSIMDSupportLevel() == SIMD_SSE2_Supported) && ((baseType == TYP_LONG) || baseType == TYP_UBYTE))
src/coreclr/src/jit/simd.cpp:1529:        if ((getSIMDSupportLevel() == SIMD_SSE2_Supported) && baseType == TYP_LONG)
src/coreclr/src/jit/simd.cpp:1678:    if (getSIMDSupportLevel() == SIMD_SSE2_Supported)
src/coreclr/src/jit/simd.cpp:1688:        assert(getSIMDSupportLevel() >= SIMD_SSE4_Supported);
src/coreclr/src/jit/simd.cpp:1791:        assert(getSIMDSupportLevel() >= SIMD_SSE4_Supported);
src/coreclr/src/jit/simd.cpp:1925:        (getSIMDSupportLevel() >= SIMD_SSE4_Supported &&
src/coreclr/src/jit/simd.cpp:1933:        assert(getSIMDSupportLevel() == SIMD_SSE2_Supported);
src/coreclr/src/jit/simd.cpp:3087:            if (!varTypeIsFloating(baseType) && !(baseType == TYP_INT && getSIMDSupportLevel() >= SIMD_SSE4_Supported))
src/coreclr/src/jit/simdcodegenxarch.cpp:53:    assert(compiler->getSIMDSupportLevel() >= SIMD_SSE2_Supported);
src/coreclr/src/jit/simdcodegenxarch.cpp:59:            if (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:207:            else if ((baseType == TYP_INT) && (compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported))
src/coreclr/src/jit/simdcodegenxarch.cpp:245:            else if (compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:287:            else if (compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:313:            if (compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:356:                     (compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported))
src/coreclr/src/jit/simdcodegenxarch.cpp:415:            else if ((baseType == TYP_LONG) && (compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported))
src/coreclr/src/jit/simdcodegenxarch.cpp:515:                    if (compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:760:    SIMDLevel level      = compiler->getSIMDSupportLevel();
src/coreclr/src/jit/simdcodegenxarch.cpp:816:            if (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:1115:        if (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:1128:        if (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:1214:    SIMDLevel level = compiler->getSIMDSupportLevel();
src/coreclr/src/jit/simdcodegenxarch.cpp:1495:    if (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:1530:    SIMDLevel level    = compiler->getSIMDSupportLevel();
src/coreclr/src/jit/simdcodegenxarch.cpp:1605:    SIMDLevel level    = compiler->getSIMDSupportLevel();
src/coreclr/src/jit/simdcodegenxarch.cpp:1757:    SIMDLevel level      = compiler->getSIMDSupportLevel();
src/coreclr/src/jit/simdcodegenxarch.cpp:1881:        if (op1Reg != targetReg && compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported &&
src/coreclr/src/jit/simdcodegenxarch.cpp:1936:    SIMDLevel level      = compiler->getSIMDSupportLevel();
src/coreclr/src/jit/simdcodegenxarch.cpp:2044:                assert((compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported) && op2->IsIntegralConstVector(0));
src/coreclr/src/jit/simdcodegenxarch.cpp:2157:    SIMDLevel level = compiler->getSIMDSupportLevel();
src/coreclr/src/jit/simdcodegenxarch.cpp:2174:        if ((compiler->getSIMDSupportLevel() == SIMD_SSE2_Supported) || (simdEvalType == TYP_SIMD32))
src/coreclr/src/jit/simdcodegenxarch.cpp:2555:        assert(compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported);
src/coreclr/src/jit/simdcodegenxarch.cpp:2616:                assert(compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported);
src/coreclr/src/jit/simdcodegenxarch.cpp:2750:    if (compiler->getSIMDSupportLevel() == SIMD_SSE2_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:2793:    noway_assert(compiler->getSIMDSupportLevel() == SIMD_SSE2_Supported);

All usages of compSupports:

src/coreclr/src/jit/codegenarm64.cpp:2726:    if (compiler->compSupports(InstructionSet_Atomics))
src/coreclr/src/jit/codegenarm64.cpp:2863:    if (compiler->compSupports(InstructionSet_Atomics))
src/coreclr/src/jit/codegenxarch.cpp:7250:    assert(compiler->compSupports(InstructionSet_SSE41));
src/coreclr/src/jit/compiler.cpp:2197:    opts.compSupportsISA = 0;
src/coreclr/src/jit/compiler.cpp:2293:    if (jitFlags.IsSet(JitFlags::JIT_FLAG_USE_BMI1) && JitConfig.EnableBMI1() && compSupports(InstructionSet_AVX))
src/coreclr/src/jit/compiler.cpp:2303:    if (jitFlags.IsSet(JitFlags::JIT_FLAG_USE_BMI2) && JitConfig.EnableBMI2() && compSupports(InstructionSet_AVX))
src/coreclr/src/jit/compiler.h:3666:    bool compSupportsHWIntrinsic(InstructionSet isa);
src/coreclr/src/jit/compiler.h:7651:        if (compSupports(InstructionSet_AVX2))
src/coreclr/src/jit/compiler.h:7656:        if (compSupports(InstructionSet_SSE42))
src/coreclr/src/jit/compiler.h:8131:        if (compSupports(InstructionSet_AVX))
src/coreclr/src/jit/compiler.h:8288:    bool compSupports(InstructionSet isa) const
src/coreclr/src/jit/compiler.h:8291:        return (opts.compSupportsISA & (1ULL << isa)) != 0;
src/coreclr/src/jit/compiler.h:8300:        return compSupports(InstructionSet_AVX);
src/coreclr/src/jit/compiler.h:8396:        uint64_t compSupportsISA;
src/coreclr/src/jit/compiler.h:8399:            compSupportsISA |= 1ULL << isa;
src/coreclr/src/jit/gentree.cpp:16968:                if (compSupports(InstructionSet_SSE))
src/coreclr/src/jit/gentree.cpp:16977:                if (compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsic.cpp:262:    bool isIsaSupported = comp->compSupports(isa) && comp->compSupportsHWIntrinsic(isa);
src/coreclr/src/jit/hwintrinsic.cpp:530:// compSupportsHWIntrinsic: check whether a given instruction set is supported
src/coreclr/src/jit/hwintrinsic.cpp:537:bool Compiler::compSupportsHWIntrinsic(InstructionSet isa)
src/coreclr/src/jit/hwintrinsiccodegenxarch.cpp:1138:    assert(compiler->compSupports(InstructionSet_SSE));
src/coreclr/src/jit/hwintrinsicxarch.cpp:498:            if (!compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsicxarch.cpp:638:            if (compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsicxarch.cpp:682:            if (compSupports(InstructionSet_SSE2) || (compSupports(InstructionSet_SSE) && (baseType == TYP_FLOAT)))
src/coreclr/src/jit/hwintrinsicxarch.cpp:694:            if (compSupports(InstructionSet_SSE) && varTypeIsFloating(baseType))
src/coreclr/src/jit/hwintrinsicxarch.cpp:708:            if (compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsicxarch.cpp:720:            if (compSupports(InstructionSet_SSE))
src/coreclr/src/jit/hwintrinsicxarch.cpp:741:            if (compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsicxarch.cpp:753:            if (compSupports(InstructionSet_AVX) && varTypeIsFloating(baseType))
src/coreclr/src/jit/hwintrinsicxarch.cpp:765:            if (compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsicxarch.cpp:774:            if (!compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsicxarch.cpp:786:            if (!compSupports(InstructionSet_SSE2) || !varTypeIsArithmetic(baseType) || !indexOp->OperIsConst())
src/coreclr/src/jit/hwintrinsicxarch.cpp:802:                    if (!compSupports(InstructionSet_SSE41))
src/coreclr/src/jit/hwintrinsicxarch.cpp:810:                    if (!compSupports(InstructionSet_SSE41_X64))
src/coreclr/src/jit/hwintrinsicxarch.cpp:846:                assert(compSupports(InstructionSet_AVX));
src/coreclr/src/jit/hwintrinsicxarch.cpp:877:                    if (!compSupports(InstructionSet_SSE41))
src/coreclr/src/jit/hwintrinsicxarch.cpp:998:            if (!compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsicxarch.cpp:1010:            if (!compSupports(InstructionSet_SSE2) || !varTypeIsArithmetic(baseType) || !indexOp->OperIsConst())
src/coreclr/src/jit/hwintrinsicxarch.cpp:1026:                    if (!compSupports(InstructionSet_SSE41))
src/coreclr/src/jit/hwintrinsicxarch.cpp:1034:                    if (!compSupports(InstructionSet_SSE41_X64))
src/coreclr/src/jit/hwintrinsicxarch.cpp:1066:                assert(compSupports(InstructionSet_AVX));
src/coreclr/src/jit/hwintrinsicxarch.cpp:1124:                    if (!compSupports(InstructionSet_SSE41))
src/coreclr/src/jit/importer.cpp:4135:                if (compSupports(InstructionSet_FMA))
src/coreclr/src/jit/importer.cpp:19733:            return compSupports(InstructionSet_SSE41);
src/coreclr/src/jit/lowerarmarch.cpp:81:                return comp->compSupports(InstructionSet_Atomics) ? false
src/coreclr/src/jit/lsraarm64.cpp:393:            if (!compiler->compSupports(InstructionSet_Atomics))
src/coreclr/src/jit/lsraarm64.cpp:415:                if (!compiler->compSupports(InstructionSet_Atomics))
src/coreclr/src/jit/lsraarm64.cpp:435:            if (!compiler->compSupports(InstructionSet_Atomics))
src/coreclr/src/jit/lsraarm64.cpp:455:            if (!compiler->compSupports(InstructionSet_Atomics))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compSupports, it looks like we are correctly handling all the cases (Math.Round and various Vector128/256 helper methods, such as WithElement).
For getSIMDSupportLevel around SIMD_SSE4_Supported, it looks like there might be a couple cases (such as Equality/Inequality, Min/Max) and a couple asserts that might need further investigation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, of the getSIMDSupportLevel cases, they all look to be scenarios where we aren't utilizing such code in S.P.Corelib directly, so it isn't currently a problem.

For example, multiplication for Vector<int> is only supported on SSE4.1 and above: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/simdcodegenxarch.cpp#L209
or Vector<T> equality/inequality against constant 0 will generate `ptest: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/simdcodegenxarch.cpp#L2045

But, the S.N.Vector methods aren't recursive and so this only presents itself (like in the DotProduct case), when some other code in S.P.Corelib is calling one of these methods.

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM for the known problems.

I did a grep for all usages of getSIMDSupportLevel and compSupports to try and find other places we might need to consider. Most of them are AVX/AVX2 however, so it looks to be fine: https://github.com/dotnet/runtime/pull/33090/files#r387336876

@davidwrighton
Copy link
Member Author

I spoke to @AntonLapounov offline and got signoff.

@davidwrighton davidwrighton merged commit e66e9fa into dotnet:master Mar 4, 2020
// This list needs to match the list of intrinsics we can generate detection code for
// in HardwareIntrinsicHelpers.EmitIsSupportedIL.
#else
// For ReadyToRun, this list needs to match up with the behavior of FilterNamedIntrinsicMethodAttribs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to assert that this is always true?

@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
@davidwrighton davidwrighton deleted the fix_incorrect_r2r_codegen branch April 20, 2021 17:39
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

crossgen2 codegen for with AVX instructions differs from crossgen1 generated one
5 participants