Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf -26%] System.Tests.Perf_Random (3) #47870

Closed
DrewScoggins opened this issue Feb 4, 2021 · 9 comments
Closed

[Perf -26%] System.Tests.Perf_Random (3) #47870

DrewScoggins opened this issue Feb 4, 2021 · 9 comments
Labels

Comments

@DrewScoggins
Copy link
Member

Run Information

Architecture x64
OS Windows 10.0.18362
Baseline 18645f9534e16fdf99cf2ff4904901bc4e8e336e
Compare f80b0575d44ccdc85a18c8dc2dbdcd29f7788d0b
Diff Link

Regressions in System.Tests.Perf_Random

Benchmark Baseline Test Test/Base Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
Next_int_int 12.15 ns 15.12 ns 1.24 Trace Trace
Next_int 12.16 ns 15.03 ns 1.24 63.392373156766105 68.50399535495517 1.0806346559316888 Trace Trace
NextDouble 10.50 ns 13.80 ns 1.31 48.0352881089866 52.731103247262546 1.0977576136864564 Trace Trace

graph
graph
graph
Historical Data in Reporting System

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Tests.Perf_Random*'

.

Payloads

Baseline
Compare

Histogram

System.Tests.Perf_Random.Next_int_int

[11.916 ; 12.382) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[12.382 ; 12.843) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[12.843 ; 13.335) | @@@@@@@@@
[13.335 ; 13.761) | @@
[13.761 ; 14.248) | 
[14.248 ; 14.734) | 
[14.734 ; 15.193) | @@@@@@@@@@@@@@
[15.193 ; 15.774) | @@@@@@@@@@@

Compare Jit Disasm

; System.Tests.Perf_Random.Next_int_int()
       mov       rcx,[rcx+8]
       mov       edx,64
       mov       r8d,2710
       mov       rax,[rcx]
       mov       rax,[rax+40]
       mov       rax,[rax+30]
       jmp       rax
; Total bytes of code 29

System.Tests.Perf_Random.Next_int

[12.112 ; 12.359) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[12.359 ; 12.648) | @@@
[12.648 ; 13.158) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[13.158 ; 13.594) | 
[13.594 ; 14.134) | @
[14.134 ; 14.667) | @
[14.667 ; 15.193) | @@@@@@@@@@@@@@@@@@
[15.193 ; 16.013) | @@@@@@@

Baseline Jit Disasm

; System.Tests.Perf_Random.Next_int()
       mov       rcx,[rcx+8]
       mov       edx,2710
       mov       rax,[rcx]
       mov       rax,[rax+40]
       mov       rax,[rax+30]
       jmp       rax
; Total bytes of code 23

Compare Jit Disasm

; System.Tests.Perf_Random.Next_int()
       mov       rcx,[rcx+8]
       mov       edx,2710
       mov       rax,[rcx]
       mov       rax,[rax+40]
       mov       rax,[rax+28]
       jmp       rax
; Total bytes of code 23

System.Tests.Perf_Random.NextDouble

[10.135 ; 10.562) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[10.562 ; 10.903) | @@@@@@@
[10.903 ; 11.160) | @@@@@
[11.160 ; 11.417) | 
[11.417 ; 11.674) | 
[11.674 ; 11.931) | 
[11.931 ; 12.188) | 
[12.188 ; 12.445) | 
[12.445 ; 12.702) | 
[12.702 ; 13.067) | 
[13.067 ; 13.570) | @@@@@@@@@@@@@@@@@@@@
[13.570 ; 13.881) | @@
[13.881 ; 14.619) | @@

Baseline Jit Disasm

; System.Tests.Perf_Random.NextDouble()
       mov       rcx,[rcx+8]
       mov       rax,[rcx]
       mov       rax,[rax+48]
       mov       rax,[rax]
       jmp       rax
; Total bytes of code 17

Compare Jit Disasm

; System.Tests.Perf_Random.NextDouble()
       mov       rcx,[rcx+8]
       mov       rax,[rcx]
       mov       rax,[rax+48]
       mov       rax,[rax+18]
       jmp       rax
; Total bytes of code 18

Docs

Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository

@DrewScoggins DrewScoggins added os-windows tenet-performance Performance related issue tenet-performance-benchmarks Issue from performance benchmark arch-x64 labels Feb 4, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Feb 4, 2021
@stephentoub stephentoub added area-System.Runtime and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Feb 4, 2021
@stephentoub
Copy link
Member

@DrewScoggins, at the same time these were shown to regress slightly, did other Random benchmarks improve? How can I see all of the benchmarks on this benchmark class for the same time period?

@DrewScoggins
Copy link
Member Author

We did see improvements in the constructor as evidenced by the auto-filed issues below. You can also look at these index pages that have links to every tests full history. I will just use find in the browser to look for the tests that I care about.

Windows x64
Windows x86
Ubuntu x64
Ubuntu ARM64

DrewScoggins/performance-2#3846
DrewScoggins/performance-2#3851
DrewScoggins/performance-2#3863

@stephentoub
Copy link
Member

Hmm. I assume this is due to my changes in 5ee25aa, but I saw these as improvements rather than regressions locally, and I would have expected massive improvements in benchmarks on NextBytes() and Next(), so something seems a bit off here.

@DrewScoggins
Copy link
Member Author

DrewScoggins commented Feb 5, 2021

Here are the results from running on my machine. They seem to match what we are seeing in the lab. Next_Double and Ctor got faster, but the other three got worse. I have also grabbed the links to the coreruns that were used and put them as links above the tables. I got this information from the details area of the issue.

Baseline

Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
ctor_seeded 655.18 ns 20.283 ns 19.921 ns 657.28 ns 619.09 ns 699.49 ns 0.0640 - - 280 B
ctor 2,153.59 ns 92.377 ns 102.677 ns 2,135.08 ns 1,986.38 ns 2,344.33 ns 0.0604 - - 280 B
Next_int 15.29 ns 0.940 ns 1.045 ns 15.18 ns 13.53 ns 17.39 ns - - - -
Next_int_unseeded 13.90 ns 1.111 ns 1.235 ns 14.33 ns 11.38 ns 15.74 ns - - - -
Next_int_int 13.96 ns 0.840 ns 0.967 ns 13.71 ns 12.51 ns 15.83 ns - - - -
Next_int_int_unseeded 14.48 ns 0.743 ns 0.856 ns 14.48 ns 13.10 ns 16.00 ns - - - -
NextBytes 9,276.95 ns 380.486 ns 407.116 ns 9,297.49 ns 8,420.44 ns 10,071.75 ns - - - -
NextBytes_unseeded 10,747.47 ns 405.822 ns 451.070 ns 10,629.50 ns 10,074.97 ns 11,794.80 ns - - - -
NextDouble 11.12 ns 0.729 ns 0.839 ns 10.97 ns 10.15 ns 13.33 ns - - - -
NextDouble_unseeded 11.52 ns 0.903 ns 1.039 ns 11.29 ns 10.40 ns 13.93 ns - - - -
NextBytes_span 9,532.73 ns 189.283 ns 202.531 ns 9,499.73 ns 9,244.49 ns 9,970.05 ns - - - -
NextBytes_span_unseeded 9,813.42 ns 500.972 ns 576.920 ns 9,765.53 ns 9,079.85 ns 10,872.94 ns - - - -

Compare

Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
ctor_seeded 507.186 ns 17.5872 ns 20.2534 ns 508.721 ns 472.523 ns 546.929 ns 0.0733 - - 312 B
ctor 166.817 ns 7.7035 ns 8.8714 ns 166.989 ns 152.572 ns 179.911 ns 0.0167 - - 72 B
Next_int 16.054 ns 0.5876 ns 0.6767 ns 16.029 ns 14.969 ns 17.407 ns - - - -
Next_int_unseeded 12.374 ns 0.6484 ns 0.7207 ns 12.092 ns 11.555 ns 13.890 ns - - - -
Next_int_int 15.801 ns 0.5816 ns 0.6465 ns 15.537 ns 15.028 ns 16.981 ns - - - -
Next_int_int_unseeded 12.435 ns 0.4619 ns 0.5134 ns 12.340 ns 11.724 ns 13.485 ns - - - -
NextBytes 8,650.211 ns 368.9268 ns 424.8566 ns 8,545.533 ns 8,156.941 ns 9,401.946 ns - - - -
NextBytes_unseeded 352.777 ns 14.1414 ns 15.7182 ns 346.657 ns 332.019 ns 383.492 ns - - - -
NextDouble 13.823 ns 0.7075 ns 0.7864 ns 13.553 ns 13.027 ns 15.922 ns - - - -
NextDouble_unseeded 3.744 ns 0.1713 ns 0.1904 ns 3.749 ns 3.380 ns 4.075 ns - - - -
NextBytes_span 10,647.569 ns 574.6784 ns 661.8006 ns 10,569.405 ns 9,756.100 ns 11,665.939 ns - - - -
NextBytes_span_unseeded 386.359 ns 15.9206 ns 18.3342 ns 382.575 ns 355.052 ns 419.783 ns - - - -

@danmoseley
Copy link
Member

For the avoidance of doubt, "_unseeded" and "ctor" are the ones using the new algorithm. Below I've rearranged @DrewScoggins numbers into two tables, the seeded ones in first table (same algorithm) unseeded in second table (new algorithm) each in pairs of rows, old first then new.

The seeded ones vary, but there's a possible regression in some, perhaps due to the new indirection.

Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
ctor_seeded 655.18 ns 20.283 ns 19.921 ns 657.28 ns 619.09 ns 699.49 ns 0.0640 - - 280 B
ctor_seeded 507.186 ns 17.5872 ns 20.2534 ns 508.721 ns 472.523 ns 546.929 ns 0.0733 - - 312 B
Next_int 15.29 ns 0.940 ns 1.045 ns 15.18 ns 13.53 ns 17.39 ns - - - -
Next_int 16.054 ns 0.5876 ns 0.6767 ns 16.029 ns 14.969 ns 17.407 ns - - - -
Next_int_int 13.96 ns 0.840 ns 0.967 ns 13.71 ns 12.51 ns 15.83 ns - - - -
Next_int_int 15.801 ns 0.5816 ns 0.6465 ns 15.537 ns 15.028 ns 16.981 ns - - - -
NextBytes 9,276.95 ns 380.486 ns 407.116 ns 9,297.49 ns 8,420.44 ns 10,071.75 ns - - - -
NextBytes 8,650.211 ns 368.9268 ns 424.8566 ns 8,545.533 ns 8,156.941 ns 9,401.946 ns - - - -
NextDouble 11.12 ns 0.729 ns 0.839 ns 10.97 ns 10.15 ns 13.33 ns - - - -
NextDouble 13.823 ns 0.7075 ns 0.7864 ns 13.553 ns 13.027 ns 15.922 ns - - - -
NextBytes_span 9,532.73 ns 189.283 ns 202.531 ns 9,499.73 ns 9,244.49 ns 9,970.05 ns - - - -
NextBytes_span 10,647.569 ns 574.6784 ns 661.8006 ns 10,569.405 ns 9,756.100 ns 11,665.939 ns - - - -

The unseeded ones I think have the improvements we expected

Method Mean Error StdDev Median Min Max Gen 0 Gen 1 Gen 2 Allocated
ctor 2,153.59 ns 92.377 ns 102.677 ns 2,135.08 ns 1,986.38 ns 2,344.33 ns 0.0604 - - 280 B
ctor 166.817 ns 7.7035 ns 8.8714 ns 166.989 ns 152.572 ns 179.911 ns 0.0167 - - 72 B
Next_int_unseeded 13.90 ns 1.111 ns 1.235 ns 14.33 ns 11.38 ns 15.74 ns - - - -
Next_int_unseeded 12.374 ns 0.6484 ns 0.7207 ns 12.092 ns 11.555 ns 13.890 ns - - - -
Next_int_int_unseeded 14.48 ns 0.743 ns 0.856 ns 14.48 ns 13.10 ns 16.00 ns - - - -
Next_int_int_unseeded 12.435 ns 0.4619 ns 0.5134 ns 12.340 ns 11.724 ns 13.485 ns - - - -
NextBytes_unseeded 10,747.47 ns 405.822 ns 451.070 ns 10,629.50 ns 10,074.97 ns 11,794.80 ns - - - -
NextBytes_unseeded 352.777 ns 14.1414 ns 15.7182 ns 346.657 ns 332.019 ns 383.492 ns - - - -
NextDouble_unseeded 11.52 ns 0.903 ns 1.039 ns 11.29 ns 10.40 ns 13.93 ns - - - -
NextDouble_unseeded 3.744 ns 0.1713 ns 0.1904 ns 3.749 ns 3.380 ns 4.075 ns - - - -
NextBytes_span_unseeded 9,813.42 ns 500.972 ns 576.920 ns 9,765.53 ns 9,079.85 ns 10,872.94 ns - - - -
NextBytes_span_unseeded 386.359 ns 15.9206 ns 18.3342 ns 382.575 ns 355.052 ns 419.783 ns - - - -

@danmoseley
Copy link
Member

danmoseley commented Feb 5, 2021

Note we have no perf test for Next() .. adding one dotnet/performance#1649

@danmoseley
Copy link
Member

What did it look like on 32 bit, Drew? That implementation is different.

@stephentoub
Copy link
Member

For the avoidance of doubt, "_unseeded" and "ctor" are the ones using the new algorithm

Ohhh! That makes much more sense then.

I'm fine accepting the small regressions in the seeded ones; we knew about those, and I don't think they matter. It's good to have the tests to catch major unexpected regressions, but if someone is using the seeded ctor they're either a) just trying to provide better randomness via their own seed and should stop doing so, or b) trying to get reproducibility in which case it's most likely in a test or something where a small regression isn't a big deal.

@danmoseley
Copy link
Member

OK fine.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 7, 2021
@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Jun 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants