This repository has been archived by the owner on Aug 11, 2020. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR proposes a new implementation of the random number generator in mshadow.
half_t
is already supported on CPU but the new generator also supports it on GPU.Implementation
New generators are based on the PCG random generator. PCG is statistically sound, fast, and small. The mt19937 generator currently used on CPU is statistically less sound and does not provide an efficient way to implement multiple independent random sequences. This PR replaces
std::mt19937
with a C++11 style PCG generator. The inventor of the PCG provides a C++11 random engine implementation but the codebase is somewhat large in order to cover wide variations of the algorithm. So this PR has its own simple implementation tailored to the use in mshadow. The xorwow algorithm, currently used on GPU via curand host API, provides an efficient way to implement multiple independent random sequences, but the curand host API does not provide an interface to utilize the feature properly. Thus this PR replaces the curand host API with PCG generators parallely running on GPU.Performance
The performance gain depends on the probability distribution and the data type.
On CPU,
float
anddouble
.half_t
.On GPU, the performance also depends on the number of parallel generators used. The number of generators used in curand host API is not known, but I guess it as 4096 from some observations with nvvp. If the number of generators is 4096 (the default),
float
anddouble
.float
, the uniform distribution in the unit interval and the standard normal distribution are equivalent to the current ones.double
, the uniform distribution in the unit interval and the standard normal distribution are slower 1.5x. This is the only case that the new one is slower.half_t
is 1.0x ~1.5x slower thanfloat
depending on the size of the tensor and the distribution.API change
mshadow::Random<cpu, Dtype>
's constructor andSeed
method get one additional integer argument to set the random number stream. A default value is provided. Two generators with the same seed but with different streams generate statistically independent random number sequences.mshadow::Random<gpu, Dtype>
's constructor andSeed
method get two additional arguments. One is to set the random number stream and the other is to set the number of parallel generators to be used internally. A default value is provided for each.mshadow::PCGRandom32
andmshadow::PCGRandom64
. They are C++11 random engines implementing PCG and can be used on both CPU and GPU.mshadow::UniformRealDistribution<Device, RandGenerator, DType>
andmshadow::GaussianDistribution<Device, RandGenerator, DType>
. They correspond tostd::uniform_real_distribution
andstd::normal_distribution
but are more efficient if used with the provided PCG generators. They also work on both CPU and GPU.Compatibility
This PR does not break the API compatibility. It provides additional arguments with defaults for existing functions or introduces new ones. The behavior of the current API does not change except that it produces different random number sequences. I also checked that mxnet is compiled and passed the unit tests.