Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding unrelated code affects benchmark #36

Open
ChristopherBiscardi opened this issue Dec 2, 2023 · 5 comments
Open

Adding unrelated code affects benchmark #36

ChristopherBiscardi opened this issue Dec 2, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@ChristopherBiscardi
Copy link

Heya! I've been doing advent-of-code and trying out Divan while doing so. Someone showed me some code and I added it to my project to benchmark. Simply adding the code to lib.rs caused one of my benchmarks to rise by 100 microseconds. Commenting or uncommenting this one line (which is not used in the benchmark) in lib.rs causes the part2_nom benchmark to inflate or deflate accordingly.

// pub mod part2_subject;

I've pared the example down as much as I could at the moment. I can try to make a smaller one if I have more time.

https://github.com/ChristopherBiscardi/advent-of-code/tree/divan-benchmark-inflation

This should be enough to run the benchmarks:

cargo bench -p day-01

benchmarks with subject module

day_01                 fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ part1               35.7 µs       │ 93.41 µs      │ 36.14 µs      │ 37.26 µs      │ 100     │ 100
├─ part2               134.3 µs      │ 163.5 µs      │ 134.7 µs      │ 137.1 µs      │ 100     │ 100
├─ part2_aho_corasick  26.7 ms       │ 37.09 ms      │ 27.08 ms      │ 27.31 ms      │ 100     │ 100
├─ part2_nom           467.4 µs      │ 568.4 µs      │ 472.3 µs      │ 480.6 µs      │ 100     │ 100
╰─ part2_subject       135.7 µs      │ 183.7 µs      │ 136.2 µs      │ 138.5 µs      │ 100     │ 100

benchmarks without subject module

day_01                 fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ part1               33.58 µs      │ 99.66 µs      │ 33.99 µs      │ 35.05 µs      │ 100     │ 100
├─ part2               134.9 µs      │ 222.4 µs      │ 139.5 µs      │ 141.3 µs      │ 100     │ 100
├─ part2_aho_corasick  27.6 ms       │ 29.34 ms      │ 28.48 ms      │ 28.41 ms      │ 100     │ 100
╰─ part2_nom           361.2 µs      │ 455.5 µs      │ 375.9 µs      │ 378.3 µs      │ 100     │ 100

Here's each run side-by-side for ease of viewing

├─ part2_nom           467.4 µs      │ 568.4 µs      │ 472.3 µs      │ 480.6 µs      │ 100     │ 100
╰─ part2_nom           361.2 µs      │ 455.5 µs      │ 375.9 µs      │ 378.3 µs      │ 100     │ 100
@nvzqz
Copy link
Owner

nvzqz commented Dec 2, 2023

Is the increase of 100µs consistent across multiple cargo bench invocations?


Also, I noticed the spike of downloads yesterday and found your AoC video. Thanks for trying it out! Currently in the growing pains phase. 😄

@nvzqz nvzqz added the bug Something isn't working label Dec 2, 2023
@ChristopherBiscardi
Copy link
Author

The increase is consistent but I wasn't tracking the exact amount closely enough to say whether it was (for example) 100µs vs 130µs and such. It was not wildly fluctuating. At the very least I ran it multiple times in each configuration, in a row, one after the other, etc, etc. and the results seemed very consistent to me over many runs.

Also, I noticed the spike of downloads yesterday and found your AoC video. Thanks for trying it out! Currently in the growing pains phase. 😄

oh fun! glad I could point some people in your direction.

@nvzqz
Copy link
Owner

nvzqz commented Dec 2, 2023

This might be a bit of work on your end, but could you please check if the same happens with Criterion?

@polarathene
Copy link

check if the same happens with Criterion?

I've encountered a similar experience and have provided a reproduction here with both divan and criterion consistently demonstrating a perf regression.

I've had some other users clone that and observe the same.

My reproduction is fairly simple, just copying a function that isn't used at all introduces a consistent overhead in timings measured on the actual benched function. I suspect it's due to compiler optimizations like this Criterion FAQ entry explains.

The reproduction I've linked identifies several changes that can be made to avoid the regression, so this sort of issue is unlikely anything to do with Divan (Criterion measures the same increase in timing).

@ChristopherBiscardi
Copy link
Author

I've confirmed that the behavior does exist in divan and criterion. It is only for this one benchmark (part2_nom from 375 to 497). The other 3 benchmarks appear stable.

The relevant commit is: https://github.com/ChristopherBiscardi/advent-of-code/tree/26ae09c876602e405dc7943ef61764afd70d8838

Runs without part2_subject

run 1
Timer precision: 41 ns
day_01                 fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ part1               39.95 µs      │ 98.29 µs      │ 40.33 µs      │ 41.24 µs      │ 100     │ 100
├─ part2               138.4 µs      │ 174.5 µs      │ 143.2 µs      │ 143.6 µs      │ 100     │ 100
├─ part2_aho_corasick  27.87 ms      │ 30.93 ms      │ 28.52 ms      │ 28.49 ms      │ 100     │ 100
╰─ part2_nom           365.5 µs      │ 522.1 µs      │ 376.2 µs      │ 380.4 µs      │ 100     │ 100

     Running benches/benchmarks-criterion.rs (target/release/deps/day_01_bench_criterion-c9c1d8429cb6331d)
part1/part1             time:   [28.212 µs 28.595 µs 28.999 µs]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

part2/part2             time:   [142.92 µs 143.11 µs 143.31 µs]
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low severe
  5 (5.00%) low mild
  4 (4.00%) high mild
part2/part2_nom         time:   [380.46 µs 380.83 µs 381.27 µs]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
part2/part2_aho_corasick
                        time:   [28.350 ms 28.382 ms 28.413 ms]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
run 2
Timer precision: 41 ns
day_01                 fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ part1               38.16 µs      │ 61.91 µs      │ 45.79 µs      │ 44.69 µs      │ 100     │ 100
├─ part2               138.2 µs      │ 178 µs        │ 144.7 µs      │ 144.4 µs      │ 100     │ 100
├─ part2_aho_corasick  27.86 ms      │ 29.82 ms      │ 28.43 ms      │ 28.4 ms       │ 100     │ 100
╰─ part2_nom           359.7 µs      │ 507.4 µs      │ 374.2 µs      │ 377.3 µs      │ 100     │ 100

     Running benches/benchmarks-criterion.rs (target/release/deps/day_01_bench_criterion-c9c1d8429cb6331d)
part1/part1             time:   [28.309 µs 28.927 µs 29.598 µs]
                        change: [-3.7815% -1.2474% +1.3855%] (p = 0.35 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

part2/part2             time:   [143.46 µs 143.75 µs 144.06 µs]
                        change: [-0.0243% +0.2199% +0.4712%] (p = 0.11 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe
part2/part2_nom         time:   [380.39 µs 383.26 µs 389.43 µs]
                        change: [-0.3633% +1.0289% +4.5633%] (p = 0.63 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  9 (9.00%) low mild
  1 (1.00%) high severe
part2/part2_aho_corasick
                        time:   [28.401 ms 28.442 ms 28.483 ms]
                        change: [+0.0227% +0.2104% +0.3871%] (p = 0.02 < 0.05)
                        Change within noise threshold.
run 3
Timer precision: 41 ns
day_01                 fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ part1               40.83 µs      │ 86.7 µs       │ 44.54 µs      │ 44.97 µs      │ 100     │ 100
├─ part2               138.7 µs      │ 173.4 µs      │ 144.3 µs      │ 144.2 µs      │ 100     │ 100
├─ part2_aho_corasick  27.6 ms       │ 28.76 ms      │ 28.43 ms      │ 28.37 ms      │ 100     │ 100
╰─ part2_nom           352.9 µs      │ 466.9 µs      │ 375.9 µs      │ 377.4 µs      │ 100     │ 100

     Running benches/benchmarks-criterion.rs (target/release/deps/day_01_bench_criterion-c9c1d8429cb6331d)
part1/part1             time:   [27.161 µs 27.608 µs 28.093 µs]
                        change: [-3.5336% -1.2345% +1.0798%] (p = 0.32 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

part2/part2             time:   [143.06 µs 143.35 µs 143.69 µs]
                        change: [-0.3984% -0.0981% +0.2096%] (p = 0.54 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
part2/part2_nom         time:   [380.71 µs 382.71 µs 385.95 µs]
                        change: [-3.1649% -0.2636% +1.6587%] (p = 0.88 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) low severe
  4 (4.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe
part2/part2_aho_corasick
                        time:   [28.366 ms 28.411 ms 28.465 ms]
                        change: [-0.3222% -0.1083% +0.1199%] (p = 0.36 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

with part2_subject module

run 1 (immediately after the above runs)
Timer precision: 41 ns
day_01                 fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ part1               27.79 µs      │ 107.9 µs      │ 30.87 µs      │ 35.63 µs      │ 100     │ 100
├─ part2               144.7 µs      │ 188 µs        │ 151.7 µs      │ 153.8 µs      │ 100     │ 100
├─ part2_aho_corasick  27.58 ms      │ 35.72 ms      │ 28.48 ms      │ 28.71 ms      │ 100     │ 100
╰─ part2_nom           482.2 µs      │ 583.9 µs      │ 498.6 µs      │ 500.6 µs      │ 100     │ 100

     Running benches/benchmarks-criterion.rs (target/release/deps/day_01_bench_criterion-c9c1d8429cb6331d)
part1/part1             time:   [28.547 µs 29.003 µs 29.489 µs]
                        change: [+0.9037% +2.8745% +4.8163%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

part2/part2             time:   [142.86 µs 143.10 µs 143.35 µs]
                        change: [-0.3534% -0.0867% +0.1954%] (p = 0.54 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
part2/part2_nom         time:   [499.69 µs 500.21 µs 500.70 µs]
                        change: [+28.932% +30.226% +31.214%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
part2/part2_aho_corasick
                        time:   [28.407 ms 28.752 ms 29.277 ms]
                        change: [-0.0030% +1.1999% +3.2000%] (p = 0.13 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  4 (4.00%) high severe
run 2
Timer precision: 41 ns
day_01                 fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ part1               30.16 µs      │ 62.62 µs      │ 35.16 µs      │ 34.57 µs      │ 100     │ 100
├─ part2               136.1 µs      │ 169.4 µs      │ 144.7 µs      │ 143 µs        │ 100     │ 100
├─ part2_aho_corasick  27.24 ms      │ 28.73 ms      │ 28.41 ms      │ 28.3 ms       │ 100     │ 100
╰─ part2_nom           476.4 µs      │ 590.2 µs      │ 497.7 µs      │ 497.1 µs      │ 100     │ 100

     Running benches/benchmarks-criterion.rs (target/release/deps/day_01_bench_criterion-c9c1d8429cb6331d)
part1/part1             time:   [28.270 µs 28.877 µs 29.482 µs]
                        change: [-0.4674% +1.7851% +4.1144%] (p = 0.15 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

part2/part2             time:   [142.71 µs 143.41 µs 144.72 µs]
                        change: [-0.4805% +0.4224% +1.7963%] (p = 0.57 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe
part2/part2_nom         time:   [499.59 µs 500.15 µs 500.68 µs]
                        change: [-0.2440% +0.0456% +0.3362%] (p = 0.76 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild
part2/part2_aho_corasick
                        time:   [28.454 ms 28.580 ms 28.727 ms]
                        change: [-2.4332% -0.5968% +0.7186%] (p = 0.55 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  7 (7.00%) high severe
run 3
Timer precision: 41 ns
day_01                 fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ part1               28.12 µs      │ 62.58 µs      │ 28.43 µs      │ 33.28 µs      │ 100     │ 100
├─ part2               139.3 µs      │ 184.6 µs      │ 145 µs        │ 145.7 µs      │ 100     │ 100
├─ part2_aho_corasick  27.9 ms       │ 28.93 ms      │ 28.44 ms      │ 28.39 ms      │ 100     │ 100
╰─ part2_nom           477 µs        │ 574.4 µs      │ 497.4 µs      │ 494.9 µs      │ 100     │ 100

     Running benches/benchmarks-criterion.rs (target/release/deps/day_01_bench_criterion-c9c1d8429cb6331d)
part1/part1             time:   [28.404 µs 28.897 µs 29.428 µs]
                        change: [-3.9832% -1.6333% +0.8383%] (p = 0.20 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

part2/part2             time:   [143.02 µs 143.89 µs 144.99 µs]
                        change: [-1.6549% -0.2855% +0.7242%] (p = 0.70 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  3 (3.00%) high severe
part2/part2_nom         time:   [499.47 µs 500.23 µs 501.16 µs]
                        change: [-0.0951% +0.2081% +0.5882%] (p = 0.22 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe
part2/part2_aho_corasick
                        time:   [28.380 ms 28.436 ms 28.498 ms]
                        change: [-1.0464% -0.5065% -0.0087%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants