-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tiered jitting and BenchmarkDotNet #13069
Comments
Can I add a question to this list? Can R2R code be faster than IL-to-tier1? I mean R2R has no time constraints so are there any R2R specific optimizations? (or maybe some are planned - e.g. full escape analysis?) Regarding the BDN I'd personally just overwrite the both |
|
Tiered compilation has a likelihood to become more dynamic in the future, so some things that may be possible now may not make sense in the future. For example:
For tests that have short invocation duration and involve a small amount of code, tiering typically happens during the pilot or warmup phases, provided that the total time spent in those phases is long enough to expire the startup delay ( For tests that have long invocation duration and involve a large amount of code / many methods, some of those methods may not get tiered up by the time of the measurement phase. Most of the time would typically be going into loops and those methods would be ok, but each invocation may call some methods only once and they may not reach the threshold. This may be insignificant for perf, as most of the time would be spent in optimized code. In some cases it may make a difference, like if measuring GC effects of a test. In some cases like that it may be appropriate in the current state to disable tiered compilation or to tweak tiered compilation to tier up more quickly using environment variables. It may be beneficial to add a mode that is configurable at a project level to tier up aggressively for these kinds of cases, and based on changes in the future that mode could be tweaked to do something reasonable for that purpose. I don't recommend disabling tiered compilation or tweaking it by default from the BDN side. In some cases it may result in perf data that is not representative to some degree (some perf differences due to change in JIT timing), perhaps more so in the future.
No, will consider adding an API post-3.0. There is an event with the info that could be gotten out-of-proc but it's fired early and won't be seen in-proc.
No, these may change or may be replaced in the future, and they are internal flags that are not supported. May be beneficial for some cases to add a "tier up aggressively" config option, but I don't recommend using that by default.
No
Other than attributing with
The pilot phase runs early before warmup and perf could be very different in some cases during the pilot phase and after warmup when tiering is enabled. If piloting completes before the main parts of the benchmark are tiered up, the invocation count that is determined could be much lower compared with tiering disabled. Some benchmarks may perform differently at different invocation counts. Perhaps the warmup phase could also adjust invocation counts, and overhead could be measured after warmup (if overhead measurement is dependent on the invocation count)? There are some events that can be gotten in-proc with |
@kouvel thank you very much for such a detailed answer! Let me know if you come up with any ideas of additional BenchmarkDotNet features that may help to improve accuracy. By the way, we know the exact version of .NET Core during benchmarking, so we can introduce some heuristics for specific versions of .NET Core based on the knowledge of its internals.
It's a very good idea! It may also help to resolve some problems which are not related to tiered jitting: sometimes we choose a bad number of invocation during the pilot stage because of heavy assembly loading on the first pilot iteration. I created a separate issue for that: dotnet/BenchmarkDotNet#1210 |
Today I've hit a problem related to Tiered JIT and BenchmarkDotNet that most probably affects the stability of the benchmark results in some edge cases. The benchmark was executed more than 30 times and for longer than 100 ms there was no new method compilation, however, the "hot" methods did not get promoted to Tier 1. Most probably because Tiered JIT runs on a background thread and the thread did not get a chance to "kick in" and promote things to Tier 1. @kouvel is there any way of forcing the Tiered JIT to run at given moment? |
Call counting starts after there has been 100-200 ms during which no new methods are called. Methods that are called 30 times after that point (still with no new methods being called, which would initiate the delay again), would get tiered up in the background. If the total pilot+warmup duration is not long enough to reach the point when no new methods are called for the delay duration (with extra time for call counting and jitting after the delay expires) then it's definitely possible that the necessary things would not get tiered up in time.
There isn't a way to do that. If that would be necessary you'd probably be better off to disable tiering. For benchmarks though I think it would make sense to have a project-configurable option to tier up aggressively such that the timing factor can be mostly eliminated, hopefully without affecting the generated code too much. |
.NET Core 3.0 has tiered jitting enabled by default which is pretty important in the context of benchmarking: it may spoil benchmark results if the number of warmup iterations is not enough. It seems it's not such a big issue since .NET Core 3.0 preview 4 (after dotnet/coreclr#23599 was merged). I didn't observe any noticeable tiered jitting effects with .NET Core 3.0 preview 6: all of my benchmarks produce pretty stable results. However, the internal logic of the tiered jitting can be changed in the future versions of .NET Core, so I would like to discuss how can we prepare for the upcoming changes. Currently, I have the following questions:
TieredCompilation
orTC_QuickJit
from the runtime? It will be nice to see if the tiered complication is enabled or disabled in the environment section of the BenchmarkDotNet output. Of course, we can use the knowledge of the current environment variable values and the corresponding defaults for each version of .NET Core, but I'm looking for a more reliable way that will not depend on the specific .NET Core version.TC_CallCountThreshold
orTC_CallCountingDelayMs
? It will be great to use these values in the internal BenchmarkDotNet heuristics (e.g., we can always try to invoke the benchmarked method at leastTC_CallCountThreshold
times)./cc @kouvel @noahfalk @adamsitnik
Some relevant discussions: dotnet/coreclr#23599 https://github.com/dotnet/coreclr/issues/19751 https://github.com/dotnet/coreclr/issues/22998 dotnet/core#2257 dotnet/BenchmarkDotNet#1125 dotnet/coreclr#24576
The text was updated successfully, but these errors were encountered: