Discussion: Crossgen platform capabilities SIMD/hwintrinsics and tiered compilation. #9408

sdmaclea · 2017-12-07T22:04:20Z

@dotnet/jit-contrib

As I think about ARM64 platform capabilities and how they will impact crossgen/R2R code. I am thinking tiered compilation might help.

During crossgen if the minimal supported platform flags are enabled, that would allow all platforms to startup quickly.

If code which is platform dependent during crossgen is detected, the generated code would be functional but sub-optimal. If this was noted in the generated code, then tiered compilation would have the opportunity to re JIT hot code with platform specific flags enabled.

I really do not understand the status of tiered compilation or how it interacts with crossgen already, but I thought this might be worth discussing and possibly adding to the tiered jit plans.

category:proposal
theme:runtime
skill-level:expert
cost:small

BruceForstall · 2017-12-07T23:23:27Z

This makes sense to me.

Tiered compilation is an experimental feature which has a ways to go before it is generally available and enabled.

cc @noahfalk

sdmaclea · 2017-12-07T23:35:22Z

Tiered compilation is an experimental feature

Re-JIT is sufficient for this, but treating as a special case of tiered compilation might be simpler.

mattwarren · 2017-12-08T09:47:32Z

From my reading of the Code Versioning Design Doc, it is meant to provide more than just tiered compilation:

Code versioning allows a single logical method to be implemented with different code bodies over the lifetime of an application process.

So it sounds like it'll be a more general approach, that can allow different strategies for code replacement/re-compilation. See 'Future roadmap possibilities' for other ideas

sdmaclea · 2017-12-08T15:30:33Z

@mattwarren Thanks. It has been a while since I was looking for docs.

Since we are talking about Code Versioning and crossgen. If we tracked some metric of JIT code quality, then code generated by an older JIT could also trigger a ReJIT.

sdmaclea · 2017-12-08T16:50:25Z

There may be complication with XARCH Vector & ARM64 SVE Vector??? where the changing structure sizes prevents this approach.

Maybe we can mark all methods which use variable size structures. And force re jit of all pseudo atomically. (Re jit serially, but CodeManager switches them all in together)

AndyAyersMS · 2017-12-11T16:36:17Z

In principle rejitting could trigger on just about anything. The potential win from rejitting is a complex function of the cost of rejitting, the opportunity for wins, and the future frequency of calls to the method.

Having out-of-band information about any of these 3 factors can help produce a better system.

But it may also be acceptable to simply not prejit any method whose performance characteristics are strongly end-processor dependent, since we might plausibly guess that for this kind of code:

cost of jitting is relatively low (not many methods will have HW intrinsics/SIMD),
opportunity for wins is high,
frequency is likely high (as developers won't resort to HW intrinsics in general unless code is perf sensitive).
This also avoids the need to do any sort of multi-method update.

There are some challenges applying this in the pure AOT case, but it may be acceptable there to target specific CPUs or have dynamic multi-versioning.

4creators · 2017-12-14T18:20:54Z

It seems that one of .NET Framework features or something similar could be quite useful for initiating rejiting: PrepareConstrainedRegions. IMO It is one the options which are worth analyzing.

Q: Does .NET Core support PrepareConstrainedRegions? discusses how to use it or implement it in .NET Core with @jkotas proposing a simplified solution:

I think it may be reasonable to fix RuntimeHelpers.PrepareDelegate / PrepareMethod to JIT the method given to it (just the single method - without the complicated limited call-graph walk done by full .NET Framework implementation).

@AndyAyersMS indicated that what would be very useful for rejiting decisions made by compiler would be hints that any given method should be compiled for given architecture. Using RuntimeHelpers.PrepareDelegate / PrepareMethod to inform jit that it should rejit method passed in the call could be one of such ways. Furthermore call to PrepareMethod could be conditional and in the case of R2R assemblies we could compile for lowest common denominator i.e. SSE2 intrinsics what would guarantee faster startup and when support for AVX2 is detected during code execution call to RuntimeHelpers.PrepareMethod would cause method to be rejitted with support for AVX2.

if (Avx2.IsSupported)
    RuntimeHelpers.PrepareMethod(MySIMDMethod);

jkotas · 2017-12-14T18:23:55Z

I does not make sense for RuntimeHelpers.PrepareMethod to cause the method to be rejitted (ie throw away existing code and generate a new one).

jkotas · 2017-12-14T18:25:54Z

simply not prejit any method whose performance characteristics are strongly end-processor dependent

This should be the simple initial implementation. Anything more complex should be only done based on measurable data that shows the benefit.

4creators · 2017-12-14T19:06:11Z

I does not make sense for RuntimeHelpers.PrepareMethod to cause the method to be rejitted (ie throw away existing code and generate a new one).

Hmm it is hard to agree with that statement when one analyzes the example I have given. If developer knows that the benefit of rejiting will be around 2x faster execution for Avx2 in relation to Sse2 or even 4x faster execution for AVX512 intrinsics or even on Arm with comparison to Neon we could get in future 16x faster execetion for SVE 2048bit instructions the benfit is known at the moment developer decides to implement method using HW intrinsics.

Development work with HW instrinsics always involves detailed benchmarking down to single CPU cycle and at the moment product is shipped developer should know exactly what speedup each implementation will provide. If code uses HW intrinsics it is usually performance critical, therefore startup time does matter as well. I see no good reason to resign from AOT scenario which is meant to get better performance particularly during startup.

Of course there are many other possible solutions for AOT i.e.

it may be acceptable there to target specific CPUs or have dynamic multi-versioning

This solution in principle allows to target specific, low level CPU, and get dynymic multiversioning at runtime based on logic supplied by developer.

jkotas · 2017-12-14T19:33:47Z

it is hard to agree with that statement when one analyzes the example I have given

I was just saying that PrepareMethod make sense for the case where the method does not have a code at all and you want to make sure that it has one.

The situation you are talking about is different. It would require a new API like: RuntimeHelpers.ReoptimizeMethod.

4creators · 2017-12-14T19:44:20Z

It would require a new API like: RuntimeHelpers.ReoptimizeMethod

Absolutely agree - it is something what is discussed for a very long time under the term compiler hints.

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

sdmaclea closed this as completed Jun 11, 2021

ghost locked as resolved and limited conversation to collaborators Jul 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Crossgen platform capabilities SIMD/hwintrinsics and tiered compilation. #9408

Discussion: Crossgen platform capabilities SIMD/hwintrinsics and tiered compilation. #9408

sdmaclea commented Dec 7, 2017

BruceForstall commented Dec 7, 2017

sdmaclea commented Dec 7, 2017 •

edited

Loading

mattwarren commented Dec 8, 2017

sdmaclea commented Dec 8, 2017

sdmaclea commented Dec 8, 2017

AndyAyersMS commented Dec 11, 2017

4creators commented Dec 14, 2017

jkotas commented Dec 14, 2017

jkotas commented Dec 14, 2017

4creators commented Dec 14, 2017

jkotas commented Dec 14, 2017

4creators commented Dec 14, 2017

Discussion: Crossgen platform capabilities SIMD/hwintrinsics and tiered compilation. #9408

Discussion: Crossgen platform capabilities SIMD/hwintrinsics and tiered compilation. #9408

Comments

sdmaclea commented Dec 7, 2017

BruceForstall commented Dec 7, 2017

sdmaclea commented Dec 7, 2017 • edited Loading

mattwarren commented Dec 8, 2017

sdmaclea commented Dec 8, 2017

sdmaclea commented Dec 8, 2017

AndyAyersMS commented Dec 11, 2017

4creators commented Dec 14, 2017

jkotas commented Dec 14, 2017

jkotas commented Dec 14, 2017

4creators commented Dec 14, 2017

jkotas commented Dec 14, 2017

4creators commented Dec 14, 2017

sdmaclea commented Dec 7, 2017 •

edited

Loading