Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate RNG state across block to avoid need for synchronization #879

Merged
merged 2 commits into from
Apr 30, 2021

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Apr 30, 2021

Instead of 128 bytes, the device RNG now uses 32 times as much state, or 4KB out of 64KB in total. So that might lower occupancy, but on the flip side, it allows generating numbers without synchronization, and thus without the need to call rand() in a uniform manner. That turned out to be a fairly annoying requirement.

In the future, we can switch to a different RNG that uses less state (like counter-based ones), but for now this seems like a good stopgap measure.

@maleadt maleadt added enhancement New feature or request cuda kernels Stuff about writing CUDA kernels. labels Apr 30, 2021
@maleadt
Copy link
Member Author

maleadt commented Apr 30, 2021

Compared to the timings in #788 (comment), about a 4% slowdown.

@codecov
Copy link

codecov bot commented Apr 30, 2021

Codecov Report

Merging #879 (589fd5d) into master (4549cb7) will decrease coverage by 0.00%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #879      +/-   ##
==========================================
- Coverage   77.14%   77.13%   -0.01%     
==========================================
  Files         121      121              
  Lines        7546     7544       -2     
==========================================
- Hits         5821     5819       -2     
  Misses       1725     1725              
Impacted Files Coverage Δ
src/random.jl 35.61% <0.00%> (+0.94%) ⬆️
lib/cusolver/CUSOLVER.jl 78.00% <0.00%> (-4.00%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4549cb7...589fd5d. Read the comment docs.

@maleadt maleadt merged commit c66b770 into master Apr 30, 2021
@maleadt maleadt deleted the tb/syncless_rand branch April 30, 2021 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda kernels Stuff about writing CUDA kernels. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant