Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MersenneTwister: more efficient integer generation with caching #25058

Merged
merged 1 commit into from
Dec 23, 2017

Conversation

rfourquet
Copy link
Member

@rfourquet rfourquet commented Dec 13, 2017

This is the last "performance" PR I wanted in 1.0. It depends on the PR on which it is based off of for getting better performance. But the gains are less significative than when I first implemented it some 3 years ago. Hence RFC, to discuss the trade-off.
The idea is that, as MersenneTwister is natively efficient to randomize an array, we can store an array of integers as a MersenneTwister field and randomize it; each time a random integer is requested, we give one of the cached values. This is exactly like we do already for Float64 values. So do we want MersenneTwister to be heavier by some kB for better performance? I don't see any drawback, but I'm not sure. I played a little bit with the size of the cache, the "best" one so far is 10kB, with speed-ups as follows on my machine (Int==Int64):

  • Int8 -> +20%
  • Int16 -> +15%
  • Int32 -> +20%
  • Int64 -> +60%
  • Int128 -> +30%

This is WIP: MersenneTwister is untouched, the new algorithm is used when instanciating a helper RNG like r = IntCached{625}(MersenneTwister()); rand(r, Int), where 625 is the size of the cache in terms of UInt128 integers. It will be very fast to remove the WIP if we decide so.

@rfourquet rfourquet added needs decision A decision on this change is needed performance Must go faster randomness Random number generation and the Random stdlib triage This should be discussed on a triage call labels Dec 13, 2017
@StefanKarpinski StefanKarpinski removed triage This should be discussed on a triage call needs decision A decision on this change is needed labels Dec 14, 2017
@StefanKarpinski StefanKarpinski added this to the 1.0 milestone Dec 14, 2017
@rfourquet
Copy link
Member Author

@nanosoldier runbenchmarks("random", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@rfourquet
Copy link
Member Author

rfourquet commented Dec 19, 2017

I tested systematically all reasonable values for the size of the cache, and it appeared that there is a threshold (501 UInt128) beyond which the generation of integers becomes much faster. And I tested on two different machines, with different cache sizes etc. Still, if someone wants to test, you can just compare 501 and 500 for example, by changing this constant in "random/RNGs.jl" (and tell me your findings; if someone wants to test exhaustively, just ask me to push a branch where this is easier to do, without recompiling julia each time). The improvements are very similar but not exactly the same on both machines (e.g. my other one shows no improvement for Int8, Int16, Int32 but 50% improvement for Int128.

Now, this change had a consequence I didn't think about: we test that rand(97:122) gives a reproducible number whatever Int is (Int32 or Int64). But with this PR, rand(Int32) doesn't use anymore the same algorithm as before, so this is broken. Do we really want to have this feature? (I guess yes, as this is in a test). So I temporarily changed again our generator for ranges so the the test passes again, but with some regression (shown in Nanosoldier report). Temporarily, till I find a better fix, which I think I have.

Now, I have a big quesion mark in my head: if I use @btime or @benchmarkable in my REPL, I get the improvements shown in the OP. But if I run the BaseBencharks suite for "random", I get the same as Nanosoldier shows, i.e. no improvement. But this benchmark suite should benchmark rand(Int) etc, so I'm at lost. Someone as any idea what could be happening?

@StefanKarpinski
Copy link
Member

@jrevels, @andreasnoack ^ any ideas about the nanosoldier results?

@jrevels
Copy link
Member

jrevels commented Dec 19, 2017

Now, I have a big quesion mark in my head: if I use @btime or @benchmarkable in my REPL, I get the improvements shown in the OP. But if I run the BaseBencharks suite for "random", I get the same as Nanosoldier shows, i.e. no improvement. But this benchmark suite should benchmark rand(Int) etc, so I'm at lost. Someone as any idea what could be happening?

Keep in mind that if you run the BaseBenchmarks suite yourself, you technically should be tuning the benchmarks for your machine and Julia build rather than use the tuning parameters cached in the repo. Those parameters were tuned specifically with Nanosoldier in mind. It's not a problem for all benchmarks, but can be for some.

Also, unless one employs low-level randomization schemes between experiments, benchmark performance will theoretically depend on prior machine state, such that running a set of benchmarks in one sequence can produce different results than running the set in a different sequence. Again, some benchmarks are more vulnerable to these effects than others.

Can you show an example where the manual @benchmark run differs from the BaseBenchmarks version for a specific benchmark on your machine? Might help narrow this down.

@rfourquet
Copy link
Member Author

Keep in mind that if you run the BaseBenchmarks suite yourself, you technically should be tuning the benchmarks for your machine and Julia build rather than use the tuning parameters cached in the repo.

Ok great info thanks!

Can you show an example where the manual @benchmark run differs from the BaseBenchmarks version for a specific benchmark on your machine? Might help narrow this down.

For example @btime rand($(MersenneTwister()), UInt), gives consistently at least 40% improvements, while the follwowing line in BaseBenchmarks gives consistently no improvement:

    g["rand",     "MersenneTwister", tstr] = @benchmarkable _rand($MT, $(Val{T}()))

I even deleted all the lines in this file and added a simplified:

    g["rand",     "MersenneTwister", "UInt"] = @benchmarkable rand($MT, UInt)

But this doesn't change anything.

@oscardssmith
Copy link
Member

Could we create fixed fake urandom information so nanosoldier can have more consistent outcomes between tests?

@rfourquet
Copy link
Member Author

Could we create fixed fake urandom information so nanosoldier can have more consistent outcomes between tests?

I don't relly understand, but I will guess your are more talking to Jarrett ;-)

@rfourquet
Copy link
Member Author

I pushed what is hopefully a fix for the regressions, except for Bool ranges, which is addressed separately at #25190. I won't hope to see improvements with Nanosoldier on the next run, but at least no regressions would be somewhat necessary for a PR claiming perf improvements :)

@nanosoldier runbenchmarks("random", vs=":master")

@ararslan
Copy link
Member

Looks like Nanosoldier gave a 400 for your request. @nanosoldier runbenchmarks("random", vs=":master")

@ararslan
Copy link
Member

...and mine too. @nanosoldier runbenchmarks("random", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@rfourquet
Copy link
Member Author

At least the regressions affecting UInt32 are gone now. But the remaining ones for rand!(::Array{Int}, 1:1000) are weid, as the corresponding code is not even touched... I will request if one or two good souls are willing to test this branch and compare with master:

  • a = zeros(Int, 1000); @btime rand!($a, 1:1000) (should show no regression nor improvement)
  • @btime rand($(MersenneTwister()), UInt64) (should show improvement)
  • @btime rand($(MersenneTwister()), UInt128) (should show improvement)

And if you have even more time to spend, you can checkout instead the branch at #25197 ("rf/rand/cache-float") which contains this PR plus more Float caching, so running @btime rand(Float64) would help!

@maleadt
Copy link
Member

maleadt commented Dec 20, 2017

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.3131 (2017-12-20 11:49 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 564ecb0fd4* (0 days old master)
|__/                   |  x86_64-unknown-linux-gnu


julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  11.283 μs (0 allocations: 0 bytes)

julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  11.235 μs (0 allocations: 0 bytes)

julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  11.260 μs (0 allocations: 0 bytes)


julia> @btime rand($(MersenneTwister()), UInt64);
  4.144 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt64);
  4.122 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt64);
  4.042 ns (0 allocations: 0 bytes)


julia> @btime rand($(MersenneTwister()), UInt128);
  5.437 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt128);
  5.489 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt128);
  5.760 ns (0 allocations: 0 bytes)


julia> @btime rand($(MersenneTwister()), Float64);
  3.691 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), Float64);
  3.707 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), Float64);
  3.683 ns (0 allocations: 0 bytes)
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.3128 (2017-12-20 10:16 UTC)
 _/ |\__'_|_|_|\__'_|  |  rf/rand/cache-int/d32e2671b8* (fork: 1 commits, 0 days)
|__/                   |  x86_64-unknown-linux-gnu


julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  11.515 μs (0 allocations: 0 bytes)

julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  11.405 μs (0 allocations: 0 bytes)

julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  11.322 μs (0 allocations: 0 bytes)


julia> @btime rand($(MersenneTwister()), UInt64);
  3.022 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt64);
  3.027 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt64);
  3.030 ns (0 allocations: 0 bytes)


julia> @btime rand($(MersenneTwister()), UInt128);
  4.097 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt128);
  4.215 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), UInt128);
  4.102 ns (0 allocations: 0 bytes)
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.3129 (2017-12-20 10:41 UTC)
 _/ |\__'_|_|_|\__'_|  |  rf/rand/cache-float/a786746c6d* (fork: 2 commits, 0 days)
|__/                   |  x86_64-unknown-linux-gnu


julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  10.830 μs (0 allocations: 0 bytes)

julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  10.837 μs (0 allocations: 0 bytes)

julia> a = zeros(Int, 1000); @btime rand!($a, 1:1000);
  10.832 μs (0 allocations: 0 bytes)


julia> @btime rand($(MersenneTwister()), Float64);
  2.535 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), Float64);
  2.728 ns (0 allocations: 0 bytes)

julia> @btime rand($(MersenneTwister()), Float64);
  2.540 ns (0 allocations: 0 bytes)

@rfourquet
Copy link
Member Author

Thanks so much @maleadt ! this confirms at least that I don't hallucinate with performance improvents :)
I will wait till tomorrow if someone else runs the benchmarks, and I will myself have access to a another computer to see if the improvements get confirmed, Then, I would favor merging this.

@rfourquet
Copy link
Member Author

I just checked on yet another computer which also shows similar speed improvements, so I will merge when this turn green.

@rfourquet rfourquet changed the title RFC/WIP: MersenneTwister: more efficient integer generation with caching MersenneTwister: more efficient integer generation with caching Dec 23, 2017
@rfourquet rfourquet merged commit b385693 into master Dec 23, 2017
@rfourquet rfourquet deleted the rf/rand/cache-int branch December 23, 2017 16:51
@andreasnoack
Copy link
Member

@rfourquet Is it correct that this PR changed the values of rand(Int) such that they are now different from Julia 0.6 when given the same seed? I get

0.6

julia> srand(1); rand(Int)
-2795697449271335806

0.7-

julia> srand(1); rand(Int)
6592875438459116351

Just wanted to confirm since some package tests rely on reproducibility here so we'd need to find a workaround in the packages if the change is intended.

@ararslan
Copy link
Member

ararslan commented Feb 2, 2018

See also #17426, in particular Simon's comment:

In general, we've not preserved the behaviour of random functions and seeds between versions

@rfourquet
Copy link
Member Author

@andreasnoack yes the change is intended, sorry for the inconvenience! This should not change anymore before 1.0 (but I may propose a last change for generation from a range).

@bjarthur
Copy link
Contributor

@rfourquet is there a way to seed 0.7 such that it will reproduce the results from 0.6?

@rfourquet
Copy link
Member Author

The only way I imagine to get the same numbers would be to create a dummy MyCompatInt and to define rand for this type, like Int was defined in 0.6. Or too overwrite this method in 0.7...

@mschauer
Copy link
Contributor

This is why we need a way to dispatch to provide a random number generator providing the entropy and a Sampler turning the entropy into random variates with a distribution in a specified way :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants