Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrectly detected number of invocations #1780

Open
AndreyAkinshin opened this issue Aug 19, 2021 · 5 comments
Open

Incorrectly detected number of invocations #1780

AndreyAkinshin opened this issue Aug 19, 2021 · 5 comments
Assignees
Labels

Comments

@AndreyAkinshin
Copy link
Member

Currently, we have the following heuristic: if the duration of the first WorkloadJitting iteration is more than IterationTime (which is 500ms by default), we skip the Pilot stage and perform a single invocation in further iterations.

This heuristic doesn't always work well, so in this scope of #1573, we have introduced another heuristic: if the duration of the first WorkloadJitting iteration takes more than IterationTime, we but less than 1 second (a magic number), we discard the first results and perform another WorkloadJitting iteration.

Unfortunately, it doesn't always work well either. If we try to run these F# benchmarks (the problem was originally reported here by Matthew Crews), we could observe the following situation (the full log is available here):

WorkloadJitting  1: 1 op, 2660880000.00 ns, 2.6609 s/op

WorkloadWarmup   1: 1 op, 55999700.00 ns, 55.9997 ms/op
WorkloadWarmup   2: 1 op, 48296500.00 ns, 48.2965 ms/op
WorkloadWarmup   3: 1 op, 47339800.00 ns, 47.3398 ms/op
WorkloadWarmup   4: 1 op, 37181000.00 ns, 37.1810 ms/op
WorkloadWarmup   5: 1 op, 37303500.00 ns, 37.3035 ms/op
WorkloadWarmup   6: 1 op, 38082300.00 ns, 38.0823 ms/op
WorkloadWarmup   7: 1 op, 37534700.00 ns, 37.5347 ms/op
WorkloadWarmup   8: 1 op, 39494000.00 ns, 39.4940 ms/op
WorkloadWarmup   9: 1 op, 37381700.00 ns, 37.3817 ms/op

While the typical duration of the warmed benchmark is about 35-45ms, the first invocation takes more than 2.5 seconds. This situation leads to reasonable warnings:

I believe that the ultimate solution is to introduce an ability to restart the Pilot stage in the case of "too fast" WorkloadWarmup iterations. However, it would require a major refactoring of our Engine infrastructure. As a quick hotfix, we could just bump the magic number from 1 second to a higher value (let's say 10 seconds). The worst cast side effect from such a change is extra 10 seconds of the total benchmarking time for heavy benchmarks (that actually take more than 500ms, but less than 10 seconds).

@adamsitnik, what do you think?

@matthewcrews
Copy link

Is this going to be released in a new version soon? Don't mean to pressure 😊, just curious since I would like to benchmark some code that would run into this issue.

@teo-tsirpanis
Copy link
Contributor

I have encountered this phenomenon several times, mainly on code with static initialization, and it's pretty annoying.

@AndyAyersMS
Copy link
Member

See notes on how I ran into this when evaluating OSR: dotnet/runtime#33658 (comment) (and following).

Without OSR the initial workload run takes > 250 ms, so it gets a second chance (in dotnet/perf the goal time is set to 250ms); the second number is considerably lower so BDN runs pilot stages and settles on ~3K operations per workload interval.

With OSR the initial run drops to 220 MS, this isn't slow enough to trigger the second chance but is slow enough to convince BDN not to do pilot stages.

Chart below is the number of operations per workload interval over time for this test, you can see it greatly varies...

newplot - 2022-03-15T091339 640

@AndyAyersMS
Copy link
Member

There is a second part to this heuristic. Say the goal is to run for 250 ms, and the initial run comes in at 220 ms like above. We're not slow enough for the second chance, but then we fail the 250<220 < 1.5 test right after this, so we're too long to do a pilot stage.

In the tests like the above if the initial jit time crosses back and forth in the neighborhood of 250ms with BDN invocations over time, then the number of iterations fluctuates and this leads to wildly fluctuating perf numbers too.

For example (sorry the x-axes don't quite line up here, but you get the idea):

(upper is reported perf, lower is number of operations, for a benchmark over time (

newplot - 2022-03-16T152456 358
newplot - 2022-03-16T152510 098

If nothing else this second heuristic should have the same threshold as the first; as it is we have an odd band between roughly 167ms (~ 250/1.5) and 250 ms where running faster OR slower than this can greatly alter the iteration plan.

This also impacts people like me that may use a checked build of the jit or enable some other jit diagnostic output. Because this causes the jit to run more slowly, can ends up boosting the initial iteration time out of this (167, 250) zone to some value above 250, and magically the benchmark seems to run faster.

@EgorBo
Copy link
Member

EgorBo commented Mar 17, 2022

Could be related: dotnet/perf-autofiling-issues#3990 (System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Mariomkas)

AndyAyersMS added a commit to AndyAyersMS/BenchmarkDotNet that referenced this issue Mar 17, 2022
Increase the magic limit to 10s per dotnet#1780.

Also use the "1.5" criteria for retry so we don't have odd threshold
where initial jitting is a bit faster than IterationTime but not fast
enough to trigger the multi action engine.
adamsitnik pushed a commit that referenced this issue Mar 18, 2022
Increase the magic limit to 10s per #1780.

Also use the "1.5" criteria for retry so we don't have odd threshold
where initial jitting is a bit faster than IterationTime but not fast
enough to trigger the multi action engine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants