Skip to content
This repository has been archived by the owner on Aug 11, 2020. It is now read-only.

Improved random number generation #338

Closed
wants to merge 8 commits into from

Conversation

asitstands
Copy link
Contributor

@asitstands asitstands commented May 9, 2018

Overview

This PR proposes a new implementation of the random number generator in mshadow.

  • Most importantly the new generator provides multiple statistically independent streams of random numbers. For parallel computations with multiple random number generators, it is important to ensure the statistical independence between the generators. Bugs due to the hidden correlations are subtle and difficult to find out. Ensuring statistical independence of multiple random generator instances is the main motivation of this PR. MXNet uses multiple random generators but it guarantees only the independence of the generators on a single GPU device. There is no care on the independence of the generators in CPUs and multiple GPU devices. This PR would be used to remedy the situation.
  • The new generator is faster and smaller.
  • Address Random<>::SampleUniform() has inconsistant behaviours on CPU, GPU and docs #213. The upper bound of the uniform distribution is exclusive.
  • Address Shall we add half_t support to Random<cpu, DType> #306. half_t is already supported on CPU but the new generator also supports it on GPU.

Implementation

New generators are based on the PCG random generator. PCG is statistically sound, fast, and small. The mt19937 generator currently used on CPU is statistically less sound and does not provide an efficient way to implement multiple independent random sequences. This PR replaces std::mt19937 with a C++11 style PCG generator. The inventor of the PCG provides a C++11 random engine implementation but the codebase is somewhat large in order to cover wide variations of the algorithm. So this PR has its own simple implementation tailored to the use in mshadow. The xorwow algorithm, currently used on GPU via curand host API, provides an efficient way to implement multiple independent random sequences, but the curand host API does not provide an interface to utilize the feature properly. Thus this PR replaces the curand host API with PCG generators parallely running on GPU.

Performance

The performance gain depends on the probability distribution and the data type.

On CPU,

  • The uniform distribution is 2.5x faster and the normal distribution is 1.5x faster for float and double.
  • The uniform distribution is 3x faster and the normal distribution is 2x faster for half_t.

On GPU, the performance also depends on the number of parallel generators used. The number of generators used in curand host API is not known, but I guess it as 4096 from some observations with nvvp. If the number of generators is 4096 (the default),

  • The uniform distributions in non-unit intervals and non-standard normal distributions are 2x~3x faster for both float and double.
  • For float, the uniform distribution in the unit interval and the standard normal distribution are equivalent to the current ones.
  • For double, the uniform distribution in the unit interval and the standard normal distribution are slower 1.5x. This is the only case that the new one is slower.
  • Generation of half_t is 1.0x ~1.5x slower than float depending on the size of the tensor and the distribution.

API change

  • mshadow::Random<cpu, Dtype>'s constructor and Seed method get one additional integer argument to set the random number stream. A default value is provided. Two generators with the same seed but with different streams generate statistically independent random number sequences.
  • mshadow::Random<gpu, Dtype>'s constructor and Seed method get two additional arguments. One is to set the random number stream and the other is to set the number of parallel generators to be used internally. A default value is provided for each.
  • New classes mshadow::PCGRandom32 and mshadow::PCGRandom64. They are C++11 random engines implementing PCG and can be used on both CPU and GPU.
  • New classes mshadow::UniformRealDistribution<Device, RandGenerator, DType> and mshadow::GaussianDistribution<Device, RandGenerator, DType>. They correspond to std::uniform_real_distribution and std::normal_distribution but are more efficient if used with the provided PCG generators. They also work on both CPU and GPU.

Compatibility

This PR does not break the API compatibility. It provides additional arguments with defaults for existing functions or introduces new ones. The behavior of the current API does not change except that it produces different random number sequences. I also checked that mxnet is compiled and passed the unit tests.

@szha
Copy link
Member

szha commented Aug 4, 2019

This code base has been donated to the Apache MXNet project per #373, and repo is deprecated. Future development should continue in Apache MXNet.

@CLAassistant
Copy link

CLAassistant commented Jul 26, 2020

CLA assistant check
All committers have signed the CLA.

@szha szha closed this Jul 26, 2020
@szha szha reopened this Jul 26, 2020
@szha szha closed this Jul 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants