Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ILCompiler: Fix values of IsSupported for HW intrinsics on 32-bit platforms #99753

Closed
wants to merge 3 commits into from

Conversation

filipnavara
Copy link
Member

Affects ARM32 and X86. We don't want to report support for System.Runtime.Intrinsics.Arm.*.Arm64 on ARM32 and for System.Runtime.Intrinsics.X86.*.X64 on X86. The methods in these classes should still be recognized as HW intrinsics though and IsSupported should return false.

@MichalStrehovsky
Copy link
Member

Hm, this is odd. Looks like the problem is that the Sse.X64 intrinsics in the x86 CoreLib are actual intrinsics (with recursive calls and everything) as opposed to just their PlatformNotSupported, IsSupported=false flavor. Even though presumably, See.X64 never does anything useful on x86.

Because of that, we are required to report them as intrinsics to RyuJIT, even though they are as relevant as Wasm intrinsics. @tannergooding is that intentional, or just convenience on our part so that we don't have split off the x64 intrinsics into a separate file (like we do for Arm/Wasm/etc.)? If this was done for convenience, we should probably fix this by separating the x64 intrinsics into a separate file and including a NotSupported.cs flavor on x86. Then we just need to delete this line:

if (potentialType.Name == "X64")
potentialType = (MetadataType)potentialType.ContainingType;

If separating the x64 intrinsics is not feasible and this was intentional, I would kind of prefer this fix instead 55f59b5 so that we don't introduce a "real hardware intrinsic" and "fake hardware intrinsic" dichotomy with the bool.

@filipnavara
Copy link
Member Author

If this was done for convenience, we should probably fix this by separating the x64 intrinsics into a separate file and including a NotSupported.cs flavor on x86.

+1 for doing this.

@filipnavara
Copy link
Member Author

55f59b5

That would likely fix only half of the issue. The IsSupported methods still need to be replaced with return false instead of the recursive call.

@tannergooding
Copy link
Member

tannergooding commented Mar 15, 2024

@tannergooding is that intentional, or just convenience on our part

This is very explicit on multiple fronts. There is a clear issue in ILCompiler which doesn't exist in RyuJIT (or presumably Mono either). The JIT/AOT and other compilers are all expected to be "robust" in the face of a mismatched corelib (at least as far as the hwintrinsic APIs go), which might occur for AltJIT, testing, or even custom corelib implementations (within reason).

Because of that, we are required to report them as intrinsics to RyuJIT, even though they are as relevant as Wasm intrinsics.

WASM and other unsupported intrinsics always throwing and IsSupported being constant: false is a very important factor to any compiler (JIT, AOT, Interpreter, Trimmer, etc) and ultimately codegen. It allows optimizations, particularly dead code elimination, to happen early and that has been a very important factor to reducing codegen size and improving overall throughput.

If this was done for convenience, we should probably fix this by separating the x64 intrinsics into a separate file and including a NotSupported.cs flavor on x86

The NotSupported.cs file is mostly a hack to make supporting reflection invocation simpler. In a more ideal scenario we'd actually have every one of these APIs throw and have the compiler just do the right thing for recognized intrinsics. This would make corelib quite a bit smaller (2-4k methods can now share an IL method body entry, rather than each being their own unique recursive call) and reduce the general "risk" caused by having 2x copies of several thousand APIs per platform. -- We've had subtle bugs or docs mismatches in the past caused by this split and it's always a pain to get them fixed.

so that we don't introduce a "real hardware intrinsic" and "fake hardware intrinsic" dichotomy with the bool.

There is no "real" vs "fake". There is only hardware intrinsics and RyuJIT accordingly treats them as such.

The differentiator ends up coming down to "supported" vs "unsupported" where "supported" is already dynamic and based on whether the VM says that InstructionSet is supported. Everything else falls under the case of "unsupported" -- This includes unrecognized APIs under System.Runtime.Intrinsics.* or recognized intrinsics where the VM says the hardware (actual or target) doesn't feature the ISA.


The RyuJIT handling is here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/importercalls.cpp#L2861

You can see that it looks up the ID from the method handle and then later does the importation based on what the lookup did or did not resolve. The lookup logic ends up getting here https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/importercalls.cpp#L9756-L9840 (noting similar code exists a bit further up for System.Numerics.Vector*) and attempts a general resolution assuming the namespace matches the current architecture or is part of the shared root namespace that is considered xplat.

lookupId (https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsic.cpp#L484) first and foremost resolves the InstructionSet and checks if the VM says its supported. If the instruction set is not supported, we treat it as a recognized intrinsic: NI_Throw_PlatformNotSupportedException. If the instruction set is supported, it tries to find if the name has a valid entry in the metadata table. If there is no match, we report it as NI_Illegal and otherwise we return the resolved ID. -- Noting we have some specialized handling around IsSupported and IsHardwareAccelerated in particular, as we want these to be a constant true/false check where-ever possible. We want to do this even if we couldn't resolve the InstructionSet given that they allow mass dead code elimination and we have an expectation set in place for such APIs exposed under this namespace.

For anything that falls outside one of the recognized namespaces, we likewise still specially handle IsSupported and IsHardwareAccelerated for the same reason. We very explicitly want the JIT to understand when something is constant true/false, we want to treat unrecognized recursive calls as throwing PlatformNotSupportedException, etc. This is all to improve throughput and code quality and to "future proof" ourselves in scenarios where some community supported platform adds their own intrinsics.

@@ -64,38 +64,32 @@ public bool IsInstructionSetExplicitlyUnsupported(InstructionSet instructionSet)

public InstructionSetSupportFlags Flags => _flags;

public static string GetHardwareIntrinsicId(TargetArchitecture architecture, TypeDesc potentialTypeDesc)
public static string GetHardwareIntrinsicId(TargetArchitecture architecture, TypeDesc potentialTypeDesc, out bool unsupportedSubset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this returning a string, rather than simply some enum entry like is done for RyuJIT?

A string is expensive to compare against and it doesn't even look like this is returning an Intrinsic ID, but rather the type name. A type name by itself may or may not be sufficient for describing an actual InstructionSet and whether or not its supported. This is why we have distinct InstructionSet enum entries for each class we can encounter in RyuJIT.

{
if (potentialType.Name == "X64")
{
potentialType = (MetadataType)potentialType.ContainingType;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this more complex change rather than just having this line change to return ""; for architecture == TargetArchitecture.X86?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you simply exclude it then we end up with IsSupported and the actual HW intrinsic methods compiled from the IL code, ie. recursive calls that cause stack overflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it really seems this should be doing some resolution similar to what lookupId does in RyuJIT and deciding the IL replacement logic based on the enum value returned and not some vague estimation.

It looks like you don't need full resolution, rather you mostly just care about:

  • What ISA is the intrinsic part of?
  • Is it one of the special intrinsics (get_IsSupported, get_IsHardwareAccelerated, get_Count, etc)?
  • Is it recursive?

if (potentialType.Name == "VL")
potentialType = (MetadataType)potentialType.ContainingType;
if (potentialType.Namespace != "System.Runtime.Intrinsics.X86")
return "";
}
else if (architecture == TargetArchitecture.ARM64)
else if (architecture is TargetArchitecture.ARM64 or TargetArchitecture.ARM)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no intrinsic support for TargetArchitecture.ARM today (by any of our runtimes). Does this need to report the type or can it accordingly just report it as "unsupported" for the time being?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I folded it to follow the same logic as X64/X86. The ARM logic existed there before the change.

@@ -37,14 +37,25 @@ public HardwareIntrinsicILProvider(InstructionSetSupport isaSupport, FieldDesc i
public override MethodIL GetMethodIL(MethodDesc method)
{
TypeDesc owningType = method.OwningType;
string intrinsicId = InstructionSetSupport.GetHardwareIntrinsicId(_context.Target.Architecture, owningType);
string intrinsicId = InstructionSetSupport.GetHardwareIntrinsicId(_context.Target.Architecture, owningType, out bool unsupportedSubset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just trying to replace the method body for a couple of the special cases?

Can we not just have this logic match what RyuJIT does so we can handle the total set of special "constant" cases:

  • NI_IsSupported_True
  • NI_IsSupported_False
  • NI_IsSupported_Dynamic
  • NI_IsSupported_Type
  • NI_Throw_PlatformNotSupportedException
  • NI_Vector_GetCount

@MichalStrehovsky
Copy link
Member

@tannergooding is that intentional, or just convenience on our part

This is very explicit on multiple fronts. There is a clear issue in ILCompiler which doesn't exist in RyuJIT (or presumably Mono either). The JIT/AOT and other compilers are all expected to be "robust" in the face of a mismatched corelib (at least as far as the hwintrinsic APIs go), which might occur for AltJIT, testing, or even custom corelib implementations (within reason).

I'm not sure but i think we're talking past each other. If Sse.X64.IsSupported was returning false in the x86 corelib (instead of being a recursive call) we'd get the benefits you describe later (eg illinker would be able to trim codepaths guarded by that on x86). Why is it a very explicit choice that we don't want this to happen?

@MichalStrehovsky
Copy link
Member

55f59b5

That would likely fix only half of the issue. The IsSupported methods still need to be replaced with return false instead of the recursive call.

RyuJit would expand it in that case because because we'd report it same as coreclr vm.

It would not get optimized in the rest of the compiler (e.g. the cctor interpreter) but that because it's a general deoptimization to leave it as a recursive call in the x86 corelib (illinker and others included)

@tannergooding
Copy link
Member

tannergooding commented Mar 16, 2024

Why is it a very explicit choice that we don't want this to happen?

We do want this to happen, but changing the corelib implementation isn't the correct fix. It will be missing half the handling that RyuJIT, Mono, and other runtime/tooling are handling and expected to handle and ultimately relying on a point in time implementation detail of corelib (one that we did as it was the easiest thing at the time, not necessarily the best/most correct thing).

We fundamentally have recursive calls, some of which can be statically determined to be true/false or even statically determined to throw based on the target runtime or other options passed down to the tooling.

The ILCompiler should just have the same general handling that RyuJIT, Mono, and all the other tooling has (and is expected to have) where it understands the few special node kinds:

  • NI_IsSupported_True
  • NI_IsSupported_False
  • NI_IsSupported_Dynamic
  • NI_IsSupported_Type
  • NI_Throw_PlatformNotSupportedException
  • NI_Vector_GetCount

This isn't a lot of logic required to recognize these and the considerations for when they can be encountered, so its ultimately the better and safer option.

@MichalStrehovsky
Copy link
Member

We do want this to happen, but changing the corelib implementation isn't the correct fix. It will be missing half the handling that RyuJIT, Mono, and other runtime/tooling are handling and expected to handle and ultimately relying on a point in time implementation detail of corelib (one that we did as it was the easiest thing at the time, not necessarily the best/most correct thing).

Yep, I agree there's an ILCompiler bug here. The bug is here:

if (architecture == TargetArchitecture.X64)
{
if (potentialType.Name == "X64")
potentialType = (MetadataType)potentialType.ContainingType;
if (potentialType.Name == "VL")
potentialType = (MetadataType)potentialType.ContainingType;
if (potentialType.Namespace != "System.Runtime.Intrinsics.X86")
return "";
}
else if (architecture == TargetArchitecture.X86)
{
if (potentialType.Name == "X64")
potentialType = (MetadataType)potentialType.ContainingType;
if (potentialType.Name == "VL")
potentialType = (MetadataType)potentialType.ContainingType;
if (potentialType.Namespace != "System.Runtime.Intrinsics.X86")
return "";
}

Both X64 and x86 codepaths are exactly the same. The x86 one should not have the X64 handling. This is the bug Filip and me want to fix. The output of this method is either an empty string or the name of the intrinsic if it's applicable. Sse.X64 is not applicable on x86.

But fixing this bug runs into the issue that we also use this method to decide whether to set a flag to RyuJIT (as opposed to the logic in JitInterface that only looks at the namespace). We used to only look at the namespace name before #33274. My proposed fix in 55f59b5 is that we just go back to that. That way we'll match the VM behavior. We don't need to update CoreLib to have Sse.X64.IsSupported be false. But it would still be more optimal (e.g. for ILLinker purposes) if we did that, it can be done later, or never.

@am11
Copy link
Member

am11 commented Mar 17, 2024

My proposed fix in 55f59b5 is that we just go back to that.

As part of this, maybe we should move GetHardwareIntrinsicId from InstructionSetSupport.cs to HardwareIntrinsicILProvider.cs as private GetHardwareIntrinsicTypeName to avoid confusion? (or simply inline the logic since it's the only usage)

@tannergooding
Copy link
Member

The output of this method is either an empty string or the name of the intrinsic if it's applicable.

Just noting this doesn't appear to be the name of an intrinsic, but rather than name of an instruction set. That is, it is most similar to any entry from https://github.com/dotnet/runtime/blob/main/src/coreclr/tools/Common/JitInterface/CorInfoInstructionSet.cs

that we also use this method to decide whether to set a flag to RyuJIT (as opposed to the logic in JitInterface that only looks at the namespace).

Can you clarify what you mean here? What flag gets set in RyuJIT?

From the looks of things, IsHardwareIntrinsic is used as part of isIntrinsic, but it's not clear to me why that's necessary in the first place. We don't do anything like that for the native VM.

Rather, we simply say that anything in System.Runtime.Intrinsics.Arm is an intrinsic for Arm64 and anything under System.Runtime.Intrinsics.X86 is an intrinsic for x86/x64 (provided the type has the relevant [Intrinsic] attribute): https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/methodtablebuilder.cpp. We namely do this to limit the bloat to 1 attribute per ISA.

It looks like ILCompiler then has a separate place where it uses GetHardwareIntrinsicId to do IL replacement for the method body. But that general handling doesn't look to be "complete" per say.


Given the above, I would expect the following changes to be desirable:

  • Fix isIntrinsic to match how the native VM handles it
    • Essentially any method directly marked [Intrinsic]
    • -or-
    • any method in a non-generic type marked [Intrinsic] that exists in the matched System.Runtime.Intrinsics.* namespace (this is functionally what IsHardwareIntrinsic should be checking)
  • Fix GetHardwareIntrinsicId to return an actual intrinsic ID (not a string)
  • Update GetMethodIL to handle the well known IntrinsicId
    • NI_IsSupported_True
      • This is used for get_IsSupported and get_IsHardwareAccelerated for ISAs that can be deterministically supported for a given platform (like Sse and Sse2 on x64 or x86 Windows)
    • NI_IsSupported_False
      • This is used for get_IsSupported and get_IsHardwareAccelerated for ISAs that can be deterministically not supported for a given platform (like Sse.X64 on x86 Windows or AdvSimd on non Arm64).
    • NI_IsSupported_Dynamic
      • This is used for get_IsSupported and get_IsHardwareAccelerated for ISAs that can be dynamically checked at runtime. R2R/Crossgen2 does this for SSE4.1, for example
      • This looks to be similar to the isOptimistaicallySupported set that GetMethodIL already handles
    • NI_IsSupported_Type
      • This is used for get_IsSupported exposed on a generic type (like Vector128<T>.IsSupported) and is used to determine if the T is supported. It is constant true for supported types (byte, sbyte, short, ushort, int, uint, long, ulong, nint, nuint, float, and double) and constant false otherwise (reference types, other value types, etc)
    • NI_Throw_PlatformNotSupportedException
      • This is used for intrinsics where the ISA reported NI_IsSupported_False.
    • NI_Vector_GetCount
      • This is used for the get_Count property on generic types (like Vector128<T>.Count). For a supported T, the count is a deterministic constant. For an unsupported T, it throws PNSE

@MichalStrehovsky
Copy link
Member

Can you clarify what you mean here? What flag gets set in RyuJIT?

From the looks of things, IsHardwareIntrinsic is used as part of isIntrinsic, but it's not clear to me why that's necessary in the first place. We don't do anything like that for the native VM.

Rather, we simply say that anything in System.Runtime.Intrinsics.Arm is an intrinsic for Arm64 and anything under System.Runtime.Intrinsics.X86 is an intrinsic for x86/x64 (provided the type has the relevant [Intrinsic] attribute): https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/methodtablebuilder.cpp. We namely do this to limit the bloat to 1 attribute per ISA.

Sorry, I meant "it gets flagged as a hardware intrinsic". Yes, what you are writing is exactly in the 55f59b5 commit with my proposed fix I keep referencing from the beginning. We used to implement isIntrinsic like that before #33274.

The rest of your comments look like feedback on choices made in #33274 that set all of this up. We could track it as an issue. It doesn't need to block this bugfix.

@MichalStrehovsky
Copy link
Member

Superseded by #99894.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm32 arch-x86 area-NativeAOT-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants