-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect r2r codegen for hardware instrinsics #33090
Fix incorrect r2r codegen for hardware instrinsics #33090
Conversation
@@ -1554,6 +1554,37 @@ private uint FilterNamedIntrinsicMethodAttribs(uint attribs, MethodDesc method) | |||
fTreatAsRegularMethodCall = (methodName == "Round"); | |||
} | |||
} | |||
else if (namespaceName == "System.Numerics") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be more of a question for Tanner, but would it make sense to introduce some mechanism on JitInterface to express this? It already felt a bit ugly when this was just Round and it seems like we missed another place - feels like this could be a problem again in the future when RyuJIT finds a new place to use these ISA extensions.
What if instead of the EE side specifying what ISA extensions are allowed before compiling a method, RyuJIT would ask EE side whenever it would like to use it instead and include a reason (HW intrinsic class normally guarded by an IsSupported
check, or an opportunistic optimization of an existing API)?
This kind of contract would potentially make it easier to introduce multi-versioning later (right now the crossgen2 driver side has no way of knowing whether multiple versions even make sense).
Not sure if it's a good idea - just trying to find a way to avoid this fragility because I don't think anyone is really happy about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue only exists today for S.P.Corelib, because we have methods which we might optimize internally and therefore cannot easily express the different code paths to the JIT (functionally it is an IsSupported
check, but not one codified in the IR trees, which I believe is the root problem).
All of these checks are under Compiler::compSupports(ISA)
(HWIntrinsics and other opts) or Compiler::getSIMDSupportLevel()
(SIMDIntrinsics) calls, so it might be possible to tweak that code path to just return false
for any non-baseline code path if we are in R2R or Crossgen mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to think about a more complex instruction compatibility story, and in fact notification from RyuJit across the jit interface when a more interesting instruction is used, but I want to reserve that sort of change for future work. This is fixing a targetted problem. Once this is fixed, I can go after a more ambitious work experiment in that direction. As part of the instruction-set feature for crossgen2 I intend to build support for noting which functions are taking advantage of what instruction sets, and possibly enable/disable R2R function use on a per method basis, but that's going to require a fair bit more research.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, since this should be everwhere we need to consider...
All usages of getSIMDSupportLevel
:
src/coreclr/src/jit/compiler.h:7648: SIMDLevel getSIMDSupportLevel()
src/coreclr/src/jit/compiler.h:8065: if (getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/compiler.h:8071: assert(getSIMDSupportLevel() >= SIMD_SSE2_Supported);
src/coreclr/src/jit/compiler.h:8104: if (getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/compiler.h:8110: assert(getSIMDSupportLevel() >= SIMD_SSE2_Supported);
src/coreclr/src/jit/compiler.h:8137: assert(getSIMDSupportLevel() >= SIMD_SSE2_Supported);
src/coreclr/src/jit/lowerxarch.cpp:2479: else if ((comp->getSIMDSupportLevel() == SIMD_AVX2_Supported) &&
src/coreclr/src/jit/lowerxarch.cpp:2503: if ((comp->getSIMDSupportLevel() >= SIMD_SSE4_Supported) && op2->IsIntegralConstVector(0))
src/coreclr/src/jit/lsraxarch.cpp:1970: assert(compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported);
src/coreclr/src/jit/lsraxarch.cpp:1989: compiler->getSIMDSupportLevel() == SIMD_SSE2_Supported)
src/coreclr/src/jit/lsraxarch.cpp:2049: if ((compiler->getSIMDSupportLevel() == SIMD_SSE2_Supported) ||
src/coreclr/src/jit/lsraxarch.cpp:2059: assert(simdTree->gtSIMDBaseType == TYP_INT && compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported);
src/coreclr/src/jit/lsraxarch.cpp:2065: if (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/lsraxarch.cpp:2102: (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported))
src/coreclr/src/jit/lsraxarch.cpp:2155: if (compiler->getSIMDSupportLevel() == SIMD_SSE2_Supported)
src/coreclr/src/jit/lsraxarch.cpp:2194: if (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/lsraxarch.cpp:2214: if ((compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported) || (simdTree->gtSIMDBaseType == TYP_ULONG))
src/coreclr/src/jit/lsraxarch.cpp:2226: if ((compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported) && (simdTree->gtSIMDBaseType != TYP_DOUBLE))
src/coreclr/src/jit/simd.cpp:1431: if ((getSIMDSupportLevel() == SIMD_SSE2_Supported) && ((baseType == TYP_LONG) || baseType == TYP_UBYTE))
src/coreclr/src/jit/simd.cpp:1529: if ((getSIMDSupportLevel() == SIMD_SSE2_Supported) && baseType == TYP_LONG)
src/coreclr/src/jit/simd.cpp:1678: if (getSIMDSupportLevel() == SIMD_SSE2_Supported)
src/coreclr/src/jit/simd.cpp:1688: assert(getSIMDSupportLevel() >= SIMD_SSE4_Supported);
src/coreclr/src/jit/simd.cpp:1791: assert(getSIMDSupportLevel() >= SIMD_SSE4_Supported);
src/coreclr/src/jit/simd.cpp:1925: (getSIMDSupportLevel() >= SIMD_SSE4_Supported &&
src/coreclr/src/jit/simd.cpp:1933: assert(getSIMDSupportLevel() == SIMD_SSE2_Supported);
src/coreclr/src/jit/simd.cpp:3087: if (!varTypeIsFloating(baseType) && !(baseType == TYP_INT && getSIMDSupportLevel() >= SIMD_SSE4_Supported))
src/coreclr/src/jit/simdcodegenxarch.cpp:53: assert(compiler->getSIMDSupportLevel() >= SIMD_SSE2_Supported);
src/coreclr/src/jit/simdcodegenxarch.cpp:59: if (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:207: else if ((baseType == TYP_INT) && (compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported))
src/coreclr/src/jit/simdcodegenxarch.cpp:245: else if (compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:287: else if (compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:313: if (compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:356: (compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported))
src/coreclr/src/jit/simdcodegenxarch.cpp:415: else if ((baseType == TYP_LONG) && (compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported))
src/coreclr/src/jit/simdcodegenxarch.cpp:515: if (compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:760: SIMDLevel level = compiler->getSIMDSupportLevel();
src/coreclr/src/jit/simdcodegenxarch.cpp:816: if (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:1115: if (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:1128: if (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:1214: SIMDLevel level = compiler->getSIMDSupportLevel();
src/coreclr/src/jit/simdcodegenxarch.cpp:1495: if (compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:1530: SIMDLevel level = compiler->getSIMDSupportLevel();
src/coreclr/src/jit/simdcodegenxarch.cpp:1605: SIMDLevel level = compiler->getSIMDSupportLevel();
src/coreclr/src/jit/simdcodegenxarch.cpp:1757: SIMDLevel level = compiler->getSIMDSupportLevel();
src/coreclr/src/jit/simdcodegenxarch.cpp:1881: if (op1Reg != targetReg && compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported &&
src/coreclr/src/jit/simdcodegenxarch.cpp:1936: SIMDLevel level = compiler->getSIMDSupportLevel();
src/coreclr/src/jit/simdcodegenxarch.cpp:2044: assert((compiler->getSIMDSupportLevel() >= SIMD_SSE4_Supported) && op2->IsIntegralConstVector(0));
src/coreclr/src/jit/simdcodegenxarch.cpp:2157: SIMDLevel level = compiler->getSIMDSupportLevel();
src/coreclr/src/jit/simdcodegenxarch.cpp:2174: if ((compiler->getSIMDSupportLevel() == SIMD_SSE2_Supported) || (simdEvalType == TYP_SIMD32))
src/coreclr/src/jit/simdcodegenxarch.cpp:2555: assert(compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported);
src/coreclr/src/jit/simdcodegenxarch.cpp:2616: assert(compiler->getSIMDSupportLevel() == SIMD_AVX2_Supported);
src/coreclr/src/jit/simdcodegenxarch.cpp:2750: if (compiler->getSIMDSupportLevel() == SIMD_SSE2_Supported)
src/coreclr/src/jit/simdcodegenxarch.cpp:2793: noway_assert(compiler->getSIMDSupportLevel() == SIMD_SSE2_Supported);
All usages of compSupports
:
src/coreclr/src/jit/codegenarm64.cpp:2726: if (compiler->compSupports(InstructionSet_Atomics))
src/coreclr/src/jit/codegenarm64.cpp:2863: if (compiler->compSupports(InstructionSet_Atomics))
src/coreclr/src/jit/codegenxarch.cpp:7250: assert(compiler->compSupports(InstructionSet_SSE41));
src/coreclr/src/jit/compiler.cpp:2197: opts.compSupportsISA = 0;
src/coreclr/src/jit/compiler.cpp:2293: if (jitFlags.IsSet(JitFlags::JIT_FLAG_USE_BMI1) && JitConfig.EnableBMI1() && compSupports(InstructionSet_AVX))
src/coreclr/src/jit/compiler.cpp:2303: if (jitFlags.IsSet(JitFlags::JIT_FLAG_USE_BMI2) && JitConfig.EnableBMI2() && compSupports(InstructionSet_AVX))
src/coreclr/src/jit/compiler.h:3666: bool compSupportsHWIntrinsic(InstructionSet isa);
src/coreclr/src/jit/compiler.h:7651: if (compSupports(InstructionSet_AVX2))
src/coreclr/src/jit/compiler.h:7656: if (compSupports(InstructionSet_SSE42))
src/coreclr/src/jit/compiler.h:8131: if (compSupports(InstructionSet_AVX))
src/coreclr/src/jit/compiler.h:8288: bool compSupports(InstructionSet isa) const
src/coreclr/src/jit/compiler.h:8291: return (opts.compSupportsISA & (1ULL << isa)) != 0;
src/coreclr/src/jit/compiler.h:8300: return compSupports(InstructionSet_AVX);
src/coreclr/src/jit/compiler.h:8396: uint64_t compSupportsISA;
src/coreclr/src/jit/compiler.h:8399: compSupportsISA |= 1ULL << isa;
src/coreclr/src/jit/gentree.cpp:16968: if (compSupports(InstructionSet_SSE))
src/coreclr/src/jit/gentree.cpp:16977: if (compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsic.cpp:262: bool isIsaSupported = comp->compSupports(isa) && comp->compSupportsHWIntrinsic(isa);
src/coreclr/src/jit/hwintrinsic.cpp:530:// compSupportsHWIntrinsic: check whether a given instruction set is supported
src/coreclr/src/jit/hwintrinsic.cpp:537:bool Compiler::compSupportsHWIntrinsic(InstructionSet isa)
src/coreclr/src/jit/hwintrinsiccodegenxarch.cpp:1138: assert(compiler->compSupports(InstructionSet_SSE));
src/coreclr/src/jit/hwintrinsicxarch.cpp:498: if (!compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsicxarch.cpp:638: if (compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsicxarch.cpp:682: if (compSupports(InstructionSet_SSE2) || (compSupports(InstructionSet_SSE) && (baseType == TYP_FLOAT)))
src/coreclr/src/jit/hwintrinsicxarch.cpp:694: if (compSupports(InstructionSet_SSE) && varTypeIsFloating(baseType))
src/coreclr/src/jit/hwintrinsicxarch.cpp:708: if (compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsicxarch.cpp:720: if (compSupports(InstructionSet_SSE))
src/coreclr/src/jit/hwintrinsicxarch.cpp:741: if (compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsicxarch.cpp:753: if (compSupports(InstructionSet_AVX) && varTypeIsFloating(baseType))
src/coreclr/src/jit/hwintrinsicxarch.cpp:765: if (compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsicxarch.cpp:774: if (!compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsicxarch.cpp:786: if (!compSupports(InstructionSet_SSE2) || !varTypeIsArithmetic(baseType) || !indexOp->OperIsConst())
src/coreclr/src/jit/hwintrinsicxarch.cpp:802: if (!compSupports(InstructionSet_SSE41))
src/coreclr/src/jit/hwintrinsicxarch.cpp:810: if (!compSupports(InstructionSet_SSE41_X64))
src/coreclr/src/jit/hwintrinsicxarch.cpp:846: assert(compSupports(InstructionSet_AVX));
src/coreclr/src/jit/hwintrinsicxarch.cpp:877: if (!compSupports(InstructionSet_SSE41))
src/coreclr/src/jit/hwintrinsicxarch.cpp:998: if (!compSupports(InstructionSet_AVX))
src/coreclr/src/jit/hwintrinsicxarch.cpp:1010: if (!compSupports(InstructionSet_SSE2) || !varTypeIsArithmetic(baseType) || !indexOp->OperIsConst())
src/coreclr/src/jit/hwintrinsicxarch.cpp:1026: if (!compSupports(InstructionSet_SSE41))
src/coreclr/src/jit/hwintrinsicxarch.cpp:1034: if (!compSupports(InstructionSet_SSE41_X64))
src/coreclr/src/jit/hwintrinsicxarch.cpp:1066: assert(compSupports(InstructionSet_AVX));
src/coreclr/src/jit/hwintrinsicxarch.cpp:1124: if (!compSupports(InstructionSet_SSE41))
src/coreclr/src/jit/importer.cpp:4135: if (compSupports(InstructionSet_FMA))
src/coreclr/src/jit/importer.cpp:19733: return compSupports(InstructionSet_SSE41);
src/coreclr/src/jit/lowerarmarch.cpp:81: return comp->compSupports(InstructionSet_Atomics) ? false
src/coreclr/src/jit/lsraarm64.cpp:393: if (!compiler->compSupports(InstructionSet_Atomics))
src/coreclr/src/jit/lsraarm64.cpp:415: if (!compiler->compSupports(InstructionSet_Atomics))
src/coreclr/src/jit/lsraarm64.cpp:435: if (!compiler->compSupports(InstructionSet_Atomics))
src/coreclr/src/jit/lsraarm64.cpp:455: if (!compiler->compSupports(InstructionSet_Atomics))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For compSupports
, it looks like we are correctly handling all the cases (Math.Round
and various Vector128/256
helper methods, such as WithElement
).
For getSIMDSupportLevel
around SIMD_SSE4_Supported
, it looks like there might be a couple cases (such as Equality/Inequality, Min/Max) and a couple asserts that might need further investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, of the getSIMDSupportLevel
cases, they all look to be scenarios where we aren't utilizing such code in S.P.Corelib directly, so it isn't currently a problem.
For example, multiplication for Vector<int>
is only supported on SSE4.1 and above: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/simdcodegenxarch.cpp#L209
or Vector<T>
equality/inequality against constant 0
will generate `ptest: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/simdcodegenxarch.cpp#L2045
But, the S.N.Vector methods aren't recursive and so this only presents itself (like in the DotProduct
case), when some other code in S.P.Corelib is calling one of these methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM for the known problems.
I did a grep for all usages of getSIMDSupportLevel
and compSupports
to try and find other places we might need to consider. Most of them are AVX/AVX2 however, so it looks to be fine: https://github.com/dotnet/runtime/pull/33090/files#r387336876
I spoke to @AntonLapounov offline and got signoff. |
// This list needs to match the list of intrinsics we can generate detection code for | ||
// in HardwareIntrinsicHelpers.EmitIsSupportedIL. | ||
#else | ||
// For ReadyToRun, this list needs to match up with the behavior of FilterNamedIntrinsicMethodAttribs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to assert that this is always true?
Use of Vector dot product and Vector3/4 constructors may generate code that is not compatible with an SSE2 capable CPU.
This fix adjusts the set of rules for filtering the intrinsics when compiling System.Private.CoreLib, and makes the behavior in crossgen2 match that of crossgen1.
Fixes #32175