-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Implement "IsSupported" for all ISA classes of Intel hardware intrinsics #14020
Conversation
@@ -6,7 +6,7 @@ namespace System.Runtime.CompilerServices | |||
{ | |||
// Calls to methods or references to fields marked with this attribute may be replaced at | |||
// some call sites with jit intrinsic expansions. | |||
[AttributeUsage(AttributeTargets.Method | AttributeTargets.Constructor | AttributeTargets.Field, Inherited = false)] | |||
[AttributeUsage(AttributeTargets.Method | AttributeTargets.Constructor | AttributeTargets.Field | AttributeTargets.Property, Inherited = false)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not make sense on property. It has to be on the method implementing the property, like:
public static bool IsSupported
{
[Intrinsic]
get { return false; }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix.
@@ -13,6 +14,7 @@ namespace System.Runtime.Intrinsics.X86 | |||
[CLSCompliant(false)] | |||
public static class Aes | |||
{ | |||
[Intrinsic] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since every method in this namespace will be marked as intrinsic eventually, I am wondering whether we should rather special case System.Runtime.Intrinsics.*
(here)[https://github.com/dotnet/coreclr/blob/master/src/vm/methodtablebuilder.cpp#L5086] and set the flag for all methods under this namespaces.
https://github.com/dotnet/coreclr/blob/master/src/vm/methodtablebuilder.cpp#L5088
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should rather special case
System.Runtime.Intrinsics.*
... and set the flag for all methods under this namespaces.
Doesn't this reduce intrinsic recognition overhead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, each [Intrinsic]
adds size to System.Private.CoreLib
. It is hard to tell what is faster, and it is not that important because the result is cached. Special casing the namespace will be definitely smaller (rought estimate 5-10 kB of disk footprint).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Could we change IntrinsicsAttribute
to be able to apply on Class
and Struct
(or larger scopes)? Then specially treat System.Runtime.Intrinsics.*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You cannot apply the attribute on large scopes. It would have to be on each class or struct. It does not look like a particularly great option to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not look like a particularly great option to me.
How about considering this optimization later (in a new issue)? Or you prefer to address it in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do it now, it would save you from editing like dozen of .cs files. But if you prefer to do it separately - it is fine with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do it now, it would save you from editing like dozen of .cs files.
Good point 😄. But let's consider the "optional expand" issue at first, that is more important.
src/jit/compiler.cpp
Outdated
|
||
// Instruction set flags fo// Instruction set flags for Intel hardware intrinsics | ||
#ifdef _TARGET_XARCH_ | ||
opts.compSupportsAES = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a single "flags" variable should be used instead of all these variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, will change to use a 64-bit flag variable.
src/jit/compiler.h
Outdated
InstructionSet lookupHWIntrinsicISA(const char* className); | ||
NamedIntrinsic lookupHWIntrinsic(const char* methodName, InstructionSet isa); | ||
InstructionSet isaOfHWIntrinsic(NamedIntrinsic intrinsic); | ||
GenTreePtr impX86HWIntrinsic(NamedIntrinsic intrinsic, CORINFO_METHOD_HANDLE method); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GenTree*
instead of GenTreePtr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will do. Shall we always use GenTree*
instead of GenTreePtr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested this in other PRs and it seems that there is universal agreement about using GenTree*
instead of GenTreePtr
.
There is no GenTreeCastPtr
, no BasicBlockPtr
, no CompilerPtr
. It's just GenTreePtr
, an oddball.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
src/jit/importer.cpp
Outdated
return result; | ||
} | ||
|
||
#ifdef _TARGET_XARCH_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could all this new code be moved to a separate file? importerxarch.cpp
perhaps? This file is already large as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will separate this new code from importer.cpp to a new file. How about hwintrinsicxarch.pp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about hwintrinsicxarch.pp?
I assume you mean .cpp. Sounds reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes 😄
src/jit/lsraxarch.cpp
Outdated
@@ -2261,7 +2261,7 @@ void LinearScan::TreeNodeInfoInitSIMD(GenTreeSIMD* simdTree) | |||
{ | |||
bool needFloatTemp; | |||
if (varTypeIsSmallInt(simdTree->gtSIMDBaseType) && | |||
(compiler->getSIMDInstructionSet() == InstructionSet_AVX)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this change part of a different PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see #13879
@mellinoe @jkotas Tests failed on
Should I set the assembly reference somewhere? |
@fiigii That is a good question. The test dependencies are defined here (I think -- it might be in the test_runtime.csproj next to it). It's been a little while since I changed that stuff in coreclr. But currently, System.Runtime.Intrinsics.x86 is actually not being packaged up at all in corefx. So there's nothing for us to reference to get it here. We probably need to create a standalone package containing the assembly. I can work on that. |
Thank you for the help 😄 |
cebdde9
to
f7cdd29
Compare
Update
|
cc @jkotas @mikedn @russellhadley PTAL |
These intrinsics cannot be expanded under debug build. @jkotas , do you have any suggestion to address the "optional expansion" issue? |
src/vm/methodtablebuilder.cpp
Outdated
LPCUTF8 className; | ||
LPCUTF8 nameSpace; | ||
|
||
HRESULT hrns = GetMDImport()->GetNameOfTypeDef(bmtInternal->pType->GetTypeDefToken(), &className, &nameSpace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to do this check once per type, not once per each method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it can be under defined(_TARGET_X86_) || defined(_TARGET_AMD64_)
ifdef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will do.
My proposal would be:
This plan will:
|
Also, we will need to disable saving of compiled intrinsic method bodies into NGen images. The |
This solution makes sense to me, but it will complicate the intrinsic implementation for other hardware platforms. Originally, I planned to let x86 intrinsics back to C# code on ARM, which always returns
Don't you mean that we can compile intrinsics and insert into AOT-compiled images at runtime? |
You can still do that by having two files with the intrinsics implementations - one throwing (included in non-xarch CoreLib builds) and one recursive (included in xarch CoreLib builds).
Yes, you can say it that way. |
This solution sounds pretty cool 😎 , will do.
This AOT solution would be perfect if the start time regression is acceptable. I want to implement all the intrinsics in JIT at first and enable them for AOT later (the AOT work may need your help 😄 ). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this approach seems reasonable.
I have two main concerns:
- It appears that this breaks existing code that checks the current instruction set, and
- This increases the cost of setting the instruction set. That may not be a significant issue, but as we add intrinsics and increase the string comparisons we are doing to identify them, I'm concerned about the impact of JIT throughput. Could you please measure before & after?
src/jit/instr.h
Outdated
// Linear order end | ||
// Instruction sets have the linear order only in above area. | ||
// Values of InstructionSet not in this area cannot be compared | ||
// (i.e. compiler->getSIMDInstructionSet() >= InstructionSet_SSE3_4). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that the values can no longer be compared at all. Indeed, this change would seem to break the existing code that does comparisons, since if any of the higher bits are set, it will be true that compiler->getSIMDInstructionSet() >= InstructionSet_AVX2
, which could lead to problems. I don't think you can make this change without changing all the code that depends on the values of compiler->getSIMDInstructionSet()
being in linear order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that the values can no longer be compared at all.
I agree, and this is a temporary solution. Actually, only SIMD intrinsic (System.Numerics.Vectors
) code is comparing instruction set values, and the new hardware intrinsic implementation never depends upon the order. In this PR, SIMD still works well because getSIMDInstructionSet
only returns SSE2, SSE3_4, or AVX. I will refactor this "ISA comparison" in https://github.com/dotnet/coreclr/issues/14065
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and this is a temporary solution.
Before merging this, then, you need to add a comment indicating what the longer term solution is, and how it will interplay with getFloatingPointInstructionSet()
(and getSIMDInstructionSet()
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Will do.
} | ||
if (Aes.IsSupported && Bmi1.IsSupported && Bmi2.IsSupported && Fma.IsSupported && | ||
Lzcnt.IsSupported && Popcnt.IsSupported && Pclmulqdq.IsSupported) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is off here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
else | ||
{ | ||
result = false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is difficult to do any real verification of the result here, it might be useful to also add a test that the size of, say, Vector is 128, unless Avx2.IsSupported, in which case it should be 256.
@CarolEidt No, it does not. Currently, the new hardware intrinsic implementation does not overlap with the existing SIMD code at all. There are indeed some naming and potential codegen conflicts that we discussed in #13879, but do not impact the current codegen. I will solve all the conflicts in https://github.com/dotnet/coreclr/issues/14065 .
I think we are discussing this in https://github.com/dotnet/corefx/issues/23519 |
It would be nice to expand the tabular scheme you have introduced for the hardware intrinsics to include the non-hardware ones too (right now there's just the one, but there are two more in pending PRs, and likely more to come). In my opinion this can wait for a subsequent change where we perhaps also auto-generate the recognizer. But we should do this before opening the floodgates on the hardware intrinsic set. I wonder what the best scheme is for handling prejitted code. The basic option is to have the target-relevant We could go beyond that as follows. We could fail prejitting (or the equivalent) when one of these intrinsics is seen in the importer either as a root method or a callee. Hence any method that contains a target-appropriate A third option is for the jit to remember if it saw any of these intrinsics during importation for a method during prejit codegen, and report this back to the VM at the end of jittting, or perhaps fail but only after inlining once all the intrinsics have been seen (or let the failure to import B as an inlinee cascade up and cause A to fail too). Then neither A or B is prejitted and at jit-time B can get inlined into A. |
I think this is what we should do to start. It is what's done for the existing |
@jkotas @CarolEidt PTAL |
@jkotas does the latest change look good to you? That passed the tests on debug and release build both. |
src/vm/methodtablebuilder.cpp
Outdated
LPCUTF8 x86className; | ||
LPCUTF8 x86nameSpace; | ||
isX86IntrinsicType = false; | ||
HRESULT hrns = GetMDImport()->GetNameOfTypeDef(bmtInternal->pType->GetTypeDefToken(), &x86className, &x86nameSpace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typically just use hr
in situations like this
src/vm/methodtablebuilder.h
Outdated
@@ -2956,6 +2956,11 @@ class MethodTableBuilder | |||
, AllocMemTracker *pamTracker | |||
); | |||
|
|||
#if defined(_TARGET_X86_) || defined(_TARGET_AMD64_) | |||
private: | |||
bool isX86IntrinsicType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be called isHardwareIntrinsic
and it should not be under ifdef - we will want the same bool for other platform intrinsics, and it should be in say bmtProperties to follow the style of this type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be in say bmtProperties to follow the style of this type.
Don't you mean that isHardwareIntrinsic
should be a field of bmtProperties
struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - it is what I meant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with the JIT changes.
I would still like to see some cross-validation of Vector size and the value returned for Avx2.IsSupported()
, but I won't insist on it.
src/jit/importer.cpp
Outdated
@@ -3324,7 +3324,7 @@ GenTreePtr Compiler::impIntrinsic(GenTreePtr newobjThis, | |||
// | |||
// We disable the inlining of instrinsics for MinOpts. | |||
// | |||
if (!mustExpand && (opts.compDbgCode || opts.MinOpts())) | |||
if (!mustExpand && (opts.compDbgCode || opts.MinOpts()) && !gtIsRecursiveCall(method)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a comment explaining the purpose of this extra check.
Also, while you're in the neighborhood, can you fix the misspelled word in the existing comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, will do.
9c09635
to
79f13b1
Compare
Addressed all the feedback, thank you all 🎓. |
src/vm/methodtablebuilder.cpp
Outdated
@@ -5090,10 +5111,15 @@ MethodTableBuilder::InitNewMethodDesc( | |||
NULL, | |||
NULL); | |||
|
|||
if (hr == S_OK) | |||
if (hr == S_OK | |||
#if defined(_TARGET_X86_) || defined(_TARGET_AMD64_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please delete this ifdef here? It is not necessary and it will be become a problem for enabling the HW intrinsics on more platforms later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/vm/methodtablebuilder.cpp
Outdated
{ | ||
LPCUTF8 x86className; | ||
LPCUTF8 x86nameSpace; | ||
bmtProp->fIsHardwareIntrinsic = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary - all bmtProp fields are zero initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/vm/methodtablebuilder.cpp
Outdated
#if defined(_TARGET_X86_) || defined(_TARGET_AMD64_) | ||
// All the funtions in System.Runtime.Intrinsics.X86 are hardware intrinsics. | ||
// We specially treat them here to reduce the disk footprint of mscorlib. | ||
if (GetModule()->IsSystem()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, would it possible to more this check to MethodTableBuilder::CheckForSystemTypes
? We do fetch the name there already - it is better to do it just once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes more sense, will try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas In BuildMethodTableThrowing
, CheckForSystemTypes
is called after AllocAndInitMethodDescs();
that sets every MethodDesc to isJitIntrinsic
or not. Would it possible to chenge these call sites order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, reordering these methods would likely require more elaborate refactoring - not something to do as part of this change.
I think it is ok to leave it as is. Maybe reduce the number of cases where we fetch the name by adding && !bmtGenerics->HasInstantiation()
condition in addition to the IsSystem condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -22,6 +22,9 @@ | |||
<PackageReference Include="System.Security.Permissions"> | |||
<Version>$(CoreFxPackageVersion)</Version> | |||
</PackageReference> | |||
<PackageReference Include="System.Runtime.Intrinsics.X86"> | |||
<Version>4.5.0-preview1-25718-03</Version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mellinoe can I set a more flexible package version here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just use this like the one above:
<Version>$(CoreFxPackageVersion)</Version>
That property will get automatically updated every so often and the package will get upgraded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last week this property did not work, but today it works fine... Thank you!
@jkotas could you please merge this PR if it looks good? |
I will merge once the CI is green. Thank you! |
Thanks! |
@dotnet-bot test Windows_NT x86 Checked Build and Test |
This PR is implementing
IsSupported
properties for all ISA classes inSystem.Runtime.Intrinsics.X86
under JIT compilation (not work with AOT).Close #13982