quite faster rand(::MersenneTwister, ::Type{Float64}) etc... #37916
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Generation of
Int64
just became faster than ofFloat64
(#37914).Given that dSFMT produces natively
Float64
, and that generating a randomFloat64
needs only 52 random bits, surely performance can be improved there.Just ask the
gen_rand
routine, which randomizes the cache for floats,to not inline, as it's unlikely to be called (once in a thousand).
The speedups are about 1.8x for
rand(rng, Float64)
, 13% forrandn(rng, Float64)
, more than 1.9x forrand(rng, Int32)
and smaller integers (including
Bool
).This is a pleasant benefits / typed characters ratio, reminiscent of #9126 :)
Here are some number, in nanoseconds, for different versions of Julia (the starred ones are with this patch applied):
We can note that:
rand(Float64)
, the perfs decreased from 1.5 to master (non-*), but this change solves itrandn()
, the perfs decreased from 1.4 to 1.5 (non-*, cf. Regression in rand due to inlining behavior #37030); this PR gives back the perfs of 1.4, but not as nice as what it "could" be (4.80ns);code_typed
gives some insight as to why, I will comment on the PR which re-instantiate the 1.4 versionAs a closing note: the limit here for
rand()
is 1ns (on my machine), the time it takes perFloat64
when using the low-level dSFMT routine to randomize an array; so there might still be a small margin for improvement, but I don't think there will be many more PR from me with "rand", "MersenneTwister" and "faster" in the title...