-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Crossgen platform capabilities SIMD/hwintrinsics and tiered compilation. #9408
Comments
This makes sense to me. Tiered compilation is an experimental feature which has a ways to go before it is generally available and enabled. cc @noahfalk |
Re-JIT is sufficient for this, but treating as a special case of tiered compilation might be simpler. |
From my reading of the Code Versioning Design Doc, it is meant to provide more than just tiered compilation:
So it sounds like it'll be a more general approach, that can allow different strategies for code replacement/re-compilation. See 'Future roadmap possibilities' for other ideas |
@mattwarren Thanks. It has been a while since I was looking for docs. Since we are talking about Code Versioning and crossgen. If we tracked some metric of JIT code quality, then code generated by an older JIT could also trigger a ReJIT. |
There may be complication with XARCH Vector & ARM64 SVE Vector??? where the changing structure sizes prevents this approach. Maybe we can mark all methods which use variable size structures. And force re jit of all pseudo atomically. (Re jit serially, but CodeManager switches them all in together) |
In principle rejitting could trigger on just about anything. The potential win from rejitting is a complex function of the cost of rejitting, the opportunity for wins, and the future frequency of calls to the method. Having out-of-band information about any of these 3 factors can help produce a better system. But it may also be acceptable to simply not prejit any method whose performance characteristics are strongly end-processor dependent, since we might plausibly guess that for this kind of code:
There are some challenges applying this in the pure AOT case, but it may be acceptable there to target specific CPUs or have dynamic multi-versioning. |
It seems that one of .NET Framework features or something similar could be quite useful for initiating rejiting: Q: Does .NET Core support PrepareConstrainedRegions? discusses how to use it or implement it in .NET Core with @jkotas proposing a simplified solution:
@AndyAyersMS indicated that what would be very useful for rejiting decisions made by compiler would be hints that any given method should be compiled for given architecture. Using RuntimeHelpers.PrepareDelegate / PrepareMethod to inform jit that it should rejit method passed in the call could be one of such ways. Furthermore call to PrepareMethod could be conditional and in the case of R2R assemblies we could compile for lowest common denominator i.e. SSE2 intrinsics what would guarantee faster startup and when support for AVX2 is detected during code execution call to RuntimeHelpers.PrepareMethod would cause method to be rejitted with support for AVX2.
|
I does not make sense for |
This should be the simple initial implementation. Anything more complex should be only done based on measurable data that shows the benefit. |
Hmm it is hard to agree with that statement when one analyzes the example I have given. If developer knows that the benefit of rejiting will be around 2x faster execution for Avx2 in relation to Sse2 or even 4x faster execution for AVX512 intrinsics or even on Arm with comparison to Neon we could get in future 16x faster execetion for SVE 2048bit instructions the benfit is known at the moment developer decides to implement method using HW intrinsics. Development work with HW instrinsics always involves detailed benchmarking down to single CPU cycle and at the moment product is shipped developer should know exactly what speedup each implementation will provide. If code uses HW intrinsics it is usually performance critical, therefore startup time does matter as well. I see no good reason to resign from AOT scenario which is meant to get better performance particularly during startup. Of course there are many other possible solutions for AOT i.e.
This solution in principle allows to target specific, low level CPU, and get dynymic multiversioning at runtime based on logic supplied by developer. |
I was just saying that PrepareMethod make sense for the case where the method does not have a code at all and you want to make sure that it has one. The situation you are talking about is different. It would require a new API like: |
Absolutely agree - it is something what is discussed for a very long time under the term compiler hints. |
@dotnet/jit-contrib
As I think about ARM64 platform capabilities and how they will impact crossgen/R2R code. I am thinking tiered compilation might help.
During crossgen if the minimal supported platform flags are enabled, that would allow all platforms to startup quickly.
If code which is platform dependent during crossgen is detected, the generated code would be functional but sub-optimal. If this was noted in the generated code, then tiered compilation would have the opportunity to re JIT hot code with platform specific flags enabled.
I really do not understand the status of tiered compilation or how it interacts with crossgen already, but I thought this might be worth discussing and possibly adding to the tiered jit plans.
category:proposal
theme:runtime
skill-level:expert
cost:small
The text was updated successfully, but these errors were encountered: