-
-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrectly detected number of invocations #1780
Comments
Is this going to be released in a new version soon? Don't mean to pressure 😊, just curious since I would like to benchmark some code that would run into this issue. |
I have encountered this phenomenon several times, mainly on code with static initialization, and it's pretty annoying. |
See notes on how I ran into this when evaluating OSR: dotnet/runtime#33658 (comment) (and following). Without OSR the initial workload run takes > 250 ms, so it gets a second chance (in dotnet/perf the goal time is set to 250ms); the second number is considerably lower so BDN runs pilot stages and settles on ~3K operations per workload interval. With OSR the initial run drops to 220 MS, this isn't slow enough to trigger the second chance but is slow enough to convince BDN not to do pilot stages. Chart below is the number of operations per workload interval over time for this test, you can see it greatly varies... |
There is a second part to this heuristic. Say the goal is to run for 250 ms, and the initial run comes in at 220 ms like above. We're not slow enough for the second chance, but then we fail the 250<220 < 1.5 test right after this, so we're too long to do a pilot stage. In the tests like the above if the initial jit time crosses back and forth in the neighborhood of 250ms with BDN invocations over time, then the number of iterations fluctuates and this leads to wildly fluctuating perf numbers too. For example (sorry the x-axes don't quite line up here, but you get the idea): (upper is reported perf, lower is number of operations, for a benchmark over time ( If nothing else this second heuristic should have the same threshold as the first; as it is we have an odd band between roughly 167ms (~ 250/1.5) and 250 ms where running faster OR slower than this can greatly alter the iteration plan. This also impacts people like me that may use a checked build of the jit or enable some other jit diagnostic output. Because this causes the jit to run more slowly, can ends up boosting the initial iteration time out of this (167, 250) zone to some value above 250, and magically the benchmark seems to run faster. |
Could be related: dotnet/perf-autofiling-issues#3990 ( |
Increase the magic limit to 10s per dotnet#1780. Also use the "1.5" criteria for retry so we don't have odd threshold where initial jitting is a bit faster than IterationTime but not fast enough to trigger the multi action engine.
Increase the magic limit to 10s per #1780. Also use the "1.5" criteria for retry so we don't have odd threshold where initial jitting is a bit faster than IterationTime but not fast enough to trigger the multi action engine.
Currently, we have the following heuristic: if the duration of the first
WorkloadJitting
iteration is more thanIterationTime
(which is 500ms by default), we skip thePilot
stage and perform a single invocation in further iterations.This heuristic doesn't always work well, so in this scope of #1573, we have introduced another heuristic: if the duration of the first
WorkloadJitting
iteration takes more thanIterationTime
, we but less than 1 second (a magic number), we discard the first results and perform anotherWorkloadJitting
iteration.Unfortunately, it doesn't always work well either. If we try to run these F# benchmarks (the problem was originally reported here by Matthew Crews), we could observe the following situation (the full log is available here):
While the typical duration of the warmed benchmark is about 35-45ms, the first invocation takes more than 2.5 seconds. This situation leads to reasonable warnings:
I believe that the ultimate solution is to introduce an ability to restart the Pilot stage in the case of "too fast" WorkloadWarmup iterations. However, it would require a major refactoring of our Engine infrastructure. As a quick hotfix, we could just bump the magic number from 1 second to a higher value (let's say 10 seconds). The worst cast side effect from such a change is extra 10 seconds of the total benchmarking time for heavy benchmarks (that actually take more than 500ms, but less than 10 seconds).
@adamsitnik, what do you think?
The text was updated successfully, but these errors were encountered: