-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NativeAOT] Make casting logic closer to CoreCLR #89548
Conversation
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas Issue DetailsFixes: #84464 In progress. For the most part this is a refactoring of existing code to have a shape closer to CoreCLR. In particular to do cache lookups earlier.
|
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
I think this is ready for a review. |
Could you please collect numbers for casting microbenchmarks before/after this change? |
@@ -694,32 +694,29 @@ private ISymbolNode GetHelperFtnUncached(CorInfoHelpFunc ftnNum) | |||
break; | |||
|
|||
case CorInfoHelpFunc.CORINFO_HELP_CHKCASTANY: | |||
case CorInfoHelpFunc.CORINFO_HELP_CHKCASTARRAY: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can delete CORINFO_HELP_CHKCASTARRAY
and CORINFO_HELP_ISINSTANCEOFARRAY
from JIT/EE interface. They are unnecessary now.
One possible question here would be - Why not just port/share the CoreCLR casting helpers in their entirety and implement the internal calls in managed code - to have nearly the same implementation? I think one of the factors for the design of CoreCLR managed casting helpers was to avoid complicated API with the native type system. As a result that API is basically 2 internal calls methods - In NativeAOT the type system APIs are easily accessible, so we do not need to minimize the use of those APIs. On the other hand there are some differences, like the way we fetch the base type for arrays, that may stand in the way of code sharing. |
Also in NativeAOT these casting helpers have more users with additional needs, while in CoreClr it is really just type system facade for object casting. |
I was thinking of what we could measure here. Perhaps just running the regular casting benchmarks that perf lab uses would be informative enough about what changed perf-wise. |
I have run the same benchmark as in #84430 (comment) internal class Program
{
const int iters = 1000000;
static void Main(string[] args)
{
for(; ; )
{
Time(TestLStringToIROCstring);
}
}
static void Time(Action a)
{
var sw = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
a();
}
sw.Stop();
System.Console.WriteLine(sw.ElapsedMilliseconds);
}
static object o = new List<string>();
static void TestLStringToIROCstring()
{
for (int i = 0; i < iters; i++)
{
if (o as IReadOnlyCollection<object> == null)
throw null;
if (o as IReadOnlyCollection<string> == null)
throw null;
if (o as IEnumerable<object> == null)
throw null;
if (o as IEnumerable<string> == null)
throw null;
}
}
} === before the change
=== after the change:
The reason for the difference is that original code makes a number of calls. Profiler shows: making calls and additional checks adds up. In the new implementation there is only one helper call in the profile: |
For comparison the CoreCLR is a bit faster. The same benchmark as above produces (smaller is better):
As I see in the debugger the native code that we run after this change is nearly the same between CoreCLR and NativeAOT. mov rcx,7FFCA19E79E8h In NativeAOT loading a type looks like: lea rcx, [rip + 0x73d65] I do not see any other significant differences. Maybe it is just these little diffs and some indirect impact on code size or alignment that makes the difference. |
Another thing to notice is that compared to #84430 (comment) and prior to this change it looks like the benchmark in the comment has regressed. That was possibly caused by #86029 . I suspect it made common/simple cases faster, which is good, but regressed complex cases that rely on caching as more checks like Anyways, it looks like after this PR the complex/cached case is faster than in #84430 (comment) |
There are some |
c3672cb
to
ac36d4e
Compare
/azp run runtime-extra-platforms |
Azure Pipelines successfully started running 1 pipeline(s). |
This reverts commit 0dbd156.
I'd be ok with merging this now. I think all concerns have been resolved. Let me know if there is something that may be missing. |
@@ -6058,8 +6061,7 @@ CorInfoHelpFunc CEEInfo::getCastingHelperStatic(TypeHandle clsHnd, bool fThrowin | |||
*pfClassMustBeRestored = true; | |||
} | |||
|
|||
// If it is an array, use the fast array helper | |||
helper = CORINFO_HELP_ISINSTANCEOFARRAY; | |||
_ASSERTE(helper == CORINFO_HELP_ISINSTANCEOFANY); | |||
} | |||
else | |||
if (!clsHnd.IsTypeDesc() && !Nullable::IsNullableType(clsHnd)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The native AOT implementation does not checks for Nullable. Is this check redundant here or is the check for Nullable missing in native AOT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is that JIT tries to transform simple cases like o is int?
into o is int
, but can't do that when needs a type lookup. As I understand that is due to limitations of IR.
runtime/src/coreclr/jit/importer.cpp
Lines 5387 to 5391 in 40b39ff
// ECMA-335 III.4.3: If typeTok is a nullable type, Nullable<T>, it is interpreted as "boxed" T | |
// We can convert constant-ish tokens of nullable to its underlying type. | |
// However, when the type is shared generic parameter like Nullable<Struct<__Canon>>, the actual type will require | |
// runtime lookup. It's too complex to add another level of indirection in op2, fallback to the cast helper instead. | |
if (isClassExact && !(info.compCompHnd->getClassAttribs(pResolvedToken->hClass) & CORINFO_FLG_SHAREDINST)) |
NativeAOT does not seem to have a problem with expressing such lookup, so we always cast with nullable stripped
runtime/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/Compilation.cs
Lines 301 to 306 in 17d13fa
case ReadyToRunHelperId.TypeHandleForCasting: | |
{ | |
var type = (TypeDesc)targetOfLookup; | |
if (type.IsNullable) | |
targetOfLookup = type.Instantiation[0]; | |
return NecessaryTypeSymbolIfPossible((TypeDesc)targetOfLookup); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words in NativeAOT the type check will always be to an unboxed underlying type, thus CLASS
variant is correct, and also optimizable, since T? is isClassExact
.
In CoreCLR the type check in complex cases could be to the actual Nullable<SomeStruct<string>>
, thus it should use ANY
variant. It also means that we may be introducing an optimization bug, since such IsInst cannot be lowered to a type/handle compare.
I will check if that is a case, or if there are some mitigating reasons why it still works correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly that is the case. The following works incorrectly
===== Prints:
True
False
using System.Runtime.CompilerServices;
namespace ConsoleApp34
{
struct S1<T>
{
public T value;
}
internal class Program
{
static object o;
static void Main(string[] args)
{
Test<int>();
Test<string>();
}
[MethodImpl(MethodImplOptions.AggressiveOptimization)]
private static void Test<T>()
{
o = new S1<T>();
Console.WriteLine(o is S1<T>?);
}
}
}
This PR indirectly enabled casting optimizations for cases like o is int?
, but we need to suppress it, since in more general cases it does not work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there are better ideas, I am thinking of constraining isinst optimization for CORINFO_HELP_ISINSTANCEOFANY
only if converting to an array type. So we do not keep finding more broken cases.
That may be too conservative, but we should probably stay closer to the preexisting behavior for now and consider if more cases can work in 9.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking that this PR as a whole is too risky for 8.0. Are there parts that we think are critical to get into .NET 8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the actual casting helper change was low risk as that is mostly refactoring to get more cases to hit the cache earlier. Touching the JIT appears to be a lot more fragile.
There is nothing really "critical" for 8.0, as in - we are not fixing some complete showstoppers here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a "reduced" version of this. I think that is what we can consider for 8.0 - #90234
I have removed all the JIT changes, but kept the added codegen test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
@@ -13612,7 +13612,7 @@ methodPointerInfo* Compiler::impAllocateMethodPointerInfo(const CORINFO_RESOLVED | |||
bool Compiler::impIsClassExact(CORINFO_CLASS_HANDLE classHnd) | |||
{ | |||
DWORD flags = info.compCompHnd->getClassAttribs(classHnd); | |||
DWORD flagsMask = CORINFO_FLG_FINAL | CORINFO_FLG_VARIANCE | CORINFO_FLG_ARRAY; | |||
DWORD flagsMask = CORINFO_FLG_FINAL | CORINFO_FLG_VARIANCE | CORINFO_FLG_TYPE_EQUIVALENCE | CORINFO_FLG_ARRAY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both CORINFO_FLG_VARIANCE
and CORINFO_FLG_TYPE_EQUIVALENCE
are only computed to make the impIsClassExact
work. Computing these flags is a waste in all other cases. It is the kind of pattern that calls for introduction of dedicated JIT/EE interface API that replaces the flags.
A reduced version of this affecting only run time behavior has been merged. |
Fixes: #84464
The actual changes are not as big here as might seem. For the most part this is a refactoring of existing code to have a shape closer to CoreCLR cast helpers, so that similar patterns could be used - in a few places where that has not been done already in earlier changes.
For example cases like
CheckCastAny
- could start with a cache lookup, since uncached code path can be complex and thus relatively slow. (also addresses some old TODOs in this area)