-
-
Notifications
You must be signed in to change notification settings - Fork 963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BenchmarkDotNet (arguably) slightly overcorrects for overhead #1133
Comments
Hi @Zhentar Thank you for a great input and appologies for such a huge delay in response.
We are using delegates on purpose: to prevent the Example: private void Demo(int invocationCount)
{
Stopwatch stopwatch = Stopwatch.StartNew();
int result = 0;
Func<int> @delegate = Sample;
for (int i = 0; i < invocationCount; i++)
{
result ^= @delegate.Invoke(); result ^= @delegate.Invoke();
result ^= @delegate.Invoke(); result ^= @delegate.Invoke();
result ^= @delegate.Invoke(); result ^= @delegate.Invoke();
result ^= @delegate.Invoke(); result ^= @delegate.Invoke();
}
stopwatch.Stop();
ConsumeTheResult(result);
ReportTime(stopwatch.ElapsedTicks / invocationCount);
}
[Benchmark]
public int Sample() // some math logic Could get optimized to: private void Demo(int invocationCount)
{
Stopwatch stopwatch = Stopwatch.StartNew();
int result = Sample();
stopwatch.Stop();
ConsumeTheResult(result);
ReportTime(stopwatch.ElapsedTicks / invocationCount); // a lie
} @AndyAyersMS would be switching from delegates ( |
Calls via function pointers would not get inlined currently. |
Delegates can get inlined, so perhaps the reasoning for doing this is outdated. Function pointers may get inlined in the future also. Maybe a separate method with |
The test overhead deduction causes BDN to underreport benchmark execution times (as likely interpreted by users). The magnitude will vary depending upon hardware & the nature of test code, but should generally be on the order of 0.5ns-1.0ns.
I've noticed hints of this for a while, but only recently came to recognize what was occurring well enough to design a test that could clearly and consistently reproduce it.
Test Code
One, two, and three increments are all basically the same. But past that, there's a linear 0.32ns (or roughly 1 CPU cycle) increase in execution time for each additional increment. The first three are "free" - because they are able to run alongside the benchmark overhead instructions in the CPU pipeline, adding no effective latency to the the test harness. The execution time doesn't increase until all of that capacity has been filled, and the test flips from test harness bound to test code bound.
To an extent, this behavior isn't really wrong - after all the code will likely be running on a pipelined superscalar CPU in the real world, too. But the test harness code is probably abnormally independent of the test subject, since it doesn't interact with the results at all.
I don't have any ideas about what could/should be done regarding this in general. One thing that would help would be adding an option to use
calli
with function pointers instead of delegates in the in-process emit toolchain; reducing the total magnitude of the benchmark overhead shrinks the space in which latency can hide.p.s. I think it's pretty great that BDN is so accurate that I can detect under-counting by three cycles
The text was updated successfully, but these errors were encountered: