-
-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce code alignment #756
Comments
I've been following the same issue. I was thinking that instead of enforcing code alignment, maybe it would be better for BDN to report on it, as part of the analysis. For example, "the code for this method is NOT aligned, which may cause performance issues". |
We could report it today, I know that some of our users use |
At best you can report method alignment but that's not very relevant as what matters is loop head alignment (and sometimes the more general branch target alignment). |
Would running the benchmark several times, each time in a new process, help? That way, the final result should roughly represent the "average" alignment, at least assuming the CLR is sufficiently random about where it places code. |
If only it was random 😁. Experiments show that it can be pretty stubborn about placing code - it's all determined by the order methods are being compiled. You'd need to introduce some randomness in this, possibly by having a particular BDN method that is called or not depending on a random value. |
I am all for reporting! That would be amazing. At least something to show awareness so that someone like me doesn't go off and spend eternity chasing perceived ghosts. 😆 If we can somehow do better than that, then even better. 👍 But reporting for me would work very well in the interim in that I would know there is some funny business going on. I could then test on the other machines that I have to sort of get a better picture. To provide more context in my case, I am using baselines to compare results of similar methods. When this issue doesn't impact results, I am seeing differences like 1-2%. When this issue does impact, that jumps to 10-20% (!). If I was able to see a significant difference, along with a warning of some sort, then I would know to not only curb my enthusiasm (😄) but to also try other machines to see what they say as well. So the |
I tried to run some benchmarks with following config (you can pass env vars with BDN): class Program
{
static void Main(string[] args)
{
BenchmarkSwitcher.FromAssembly(typeof(Program).GetTypeInfo().Assembly).Run(args,
DefaultConfig.Instance
.With(Job.Default.With(CsProjCoreToolchain.NetCoreApp21).With(new[] { new EnvironmentVariable("COMPlus_JitAlignLoops", "1") }).WithId("1"))
.With(Job.Default.With(CsProjCoreToolchain.NetCoreApp21).With(new[] { new EnvironmentVariable("COMPlus_JitAlignLoops", "0") }).WithId("0")));
}
}
[DisassemblyDiagnoser]
public class Jit_LoopUnrolling
{
[Benchmark]
public int Sum()
{
int sum = 0;
for (int i = 0; i < 1024; i++)
sum += i;
return sum;
}
} I saw the difference in the disassm:
But the difference was very small (4ns)
|
it's possible today with
It's an advantage for us because we can get very similar results for same code.
This would be definitely possible, however I am not sure if people would be willing to pay the price (waiting much longer for the results). |
@AndreyAkinshin what do you think? Maybe we should expose a mode for running the benchmarks twice: once for COMPlus_JitAlignLoops=1, once COMPlus_JitAlignLoops=0? Btw the stabilizer project mentioned by @AndyAyersMS went even one step further:
|
I for one would certainly not mind waiting if it is a setting/attribute that we opt-in to enable. Along such lines...
THIS! So, I think this setting should: (Btw totally open to feedback here, and writing off the top of my head):
I have been spending the past hour and it looks like we have caught our gigantic ghost of a whale here. Using the configuration that @adamsitnik provided and getting out the in-process mode as @mikedn suggested, I was able to consistently tease apart our two modes using the code from the repro repo:
SO PRETTY! ❤️ Notice the In any case, this is a much better alternative in the interim. And here I was about to throw in the towel! 🤣 https://youtu.be/qorFS7X1i4Q?t=2m21s (Feeling very much like this due to having seen things that aren't really there. 😉) |
(BTW, I realize that I just went a little crazy with that feature request, but I wanted to underscore here that I am finally unblocked from this issue using the configuration above -- immediately. As far as I am concerned, anything else is an added bonus here. Thanks again to everyone for your help. 👍) |
FWIW I was also able to verify this behavior is consistent with my alternate repro suite which features additional cases. All four of my available machines were able to produce the two different values by way of (Guess I should have done this before I started dancing atop the heads of people. 😆) |
For occasional one-off experiment COMPlus_JitAlignLoops can be helpful in trying to confirm or refute that code alignment is a cause of unexpected performance fluctuations. But, I would not advocate using COMPlus_JitAlignLoops as part of a regular performance program or as any kind of default setting just yet -- it's something we never test, may not always work as expected, will be ignored on some architectures, has some inherent limitations (we don't constrain method entry alignment, and don't recognize all loops), does not use optimally encoded NOP sequences, won't fix issues in prejitted code, and is only supported by RyuJit. |
@AndyAyersMS thanks for the warning! Before we introduce any new JIT-related changes to BDN I will ask you first for validation of the idea. |
I completely agree with @AndyAyersMS: it's not a good idea to enable
|
I am currently working on the .NET Core 2.2 vs 3.0 comparison and the loop alignment has hit me again. I was thinking about the stabilizer project mentioned by @AndyAyersMS to me a long time ago and I was wondering if re-jitting the benchmark method after every iteration would help? @AndyAyersMS : would it be possible to implement and expose an API similar to |
It might possible to force rejitting, sure. One simpleminded way to do this would be to attach as a profiler and pretend to modify the IL (note this might have other undesirable impacts). Getting the transient closure of all methods invoked by the benchmark to rejit would be tricky. If you run each trial in a new process you should get some code and heap jittering via ASLR. I don't know how well-distributed this would end up being. We could also force 32 bit alignment of Tier1 methods (on x86/x64) which should reduce some of the variability. Also, from what I understand, these code alignment issues aren't as common on Arm64 so it would also be interesting (once we have enough data) to look at how variance levels differ across Arm64/x64. One randomization thing you could do fairly easily is jitter the stack. You could have the benchmark invoker go through a proxy layer that does smallish different-sized locallocs before invoking the method, and make sure that proxy layer is not an initlocals method (so the localloc region is not zeroed). Or say have a bunch of different proxies that have local structs of different sizes. Stack address related perf artifacts are somewhat rare, but they do exist (eg 4K aliasing from say a stack slot and a static). |
We have all agreed that BenchmarkDotNet should not use any custom settings and force the code alignment on its own. Currently, the JIT Team is experimenting with method and loop alignment for .NET 6 to stabilize the benchmark results. This work can be tracked at dotnet/runtime#43227 I am going to close this issue as there are no actionable work items on the BDN side. Thank you all for the great discussion! |
Repro: https://github.com/Mike-EEE/DotNetCore.CodeAlignment
Discussion: https://github.com/dotnet/coreclr/issues/17932
AFAIK this is not possible as of today. At least I don't know yet how to enforce it.
The text was updated successfully, but these errors were encountered: