Skip to content

Commit

Permalink
DOC: Fix Multithreaded Generation example docs (numpygh-15367)
Browse files Browse the repository at this point in the history
The old code examples did not actually run correctly and had a bug. They were
also never updated to use SeedSequence correctly. This fixes both.


* DOC, BUG: Fixes Multithreaded Generation docs

Working on numpy#15365
The code runs now, but I get quite different runtimes than
in the example. May need some more work

* DOC: changes after PR review

* DOC: Changing input/output lines to sphinx standard

For ipython example: Adding `Out[i]` where i is the input cell number
and correcting vissual multiline inputs

* MAINT: Replace `jumped` by `SeedSequence` in multithreading snippet

After PR review, decided to replace `jumped` by `SeedSequence` as it is a more general method. Adopted the introduction to reflect this.

Also added a section on the reproducibility for different machines and number of threads + an example.

* MAINT: Changes after PR review

- Switch back to default_rng() instead of the Generator(BitGenerator)) syntax pursuant @rkern 's explanation in numpy#15391
- small fixes

* MAINT: Delete unneccesary reference to PCG64

* MAINT: Added newline finxing codeblock error in docs

Trying to see whether this fixes circleci error

* MAINT: Clean up in imports of code snippet
  • Loading branch information
fzeiser authored and seberg committed Jan 24, 2020
1 parent 31e53bf commit d5a444e
Showing 1 changed file with 36 additions and 29 deletions.
65 changes: 36 additions & 29 deletions doc/source/reference/random/multithreading.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,32 +11,28 @@ these requirements.

This example makes use of Python 3 :mod:`concurrent.futures` to fill an array
using multiple threads. Threads are long-lived so that repeated calls do not
require any additional overheads from thread creation. The underlying
BitGenerator is `PCG64` which is fast, has a long period and supports
using `PCG64.jumped` to return a new generator while advancing the
state. The random numbers generated are reproducible in the sense that the same
seed will produce the same outputs.
require any additional overheads from thread creation.

The random numbers generated are reproducible in the sense that the same
seed will produce the same outputs, given that the number of threads does not
change.

.. code-block:: ipython
from numpy.random import Generator, PCG64
from numpy.random import default_rng, SeedSequence
import multiprocessing
import concurrent.futures
import numpy as np
class MultithreadedRNG:
def __init__(self, n, seed=None, threads=None):
rg = PCG64(seed)
if threads is None:
threads = multiprocessing.cpu_count()
self.threads = threads
self._random_generators = [rg]
last_rg = rg
for _ in range(0, threads-1):
new_rg = last_rg.jumped()
self._random_generators.append(new_rg)
last_rg = new_rg
seq = SeedSequence(seed)
self._random_generators = [default_rng(s)
for s in seq.spawn(threads)]
self.n = n
self.executor = concurrent.futures.ThreadPoolExecutor(threads)
Expand All @@ -61,48 +57,59 @@ seed will produce the same outputs.
self.executor.shutdown(False)
The multithreaded random number generator can be used to fill an array.
The ``values`` attributes shows the zero-value before the fill and the
random value after.

.. code-block:: ipython
In [2]: mrng = MultithreadedRNG(10000000, seed=0)
...: print(mrng.values[-1])
0.0
In [2]: mrng = MultithreadedRNG(10000000, seed=12345)
...: print(mrng.values[-1])
Out[2]: 0.0
In [3]: mrng.fill()
...: print(mrng.values[-1])
3.296046120254392
...: print(mrng.values[-1])
Out[3]: 2.4545724517479104
The time required to produce using multiple threads can be compared to
the time required to generate using a single thread.

.. code-block:: ipython
In [4]: print(mrng.threads)
...: %timeit mrng.fill()
...: %timeit mrng.fill()
4
32.8 ms ± 2.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Out[4]: 4
...: 32.8 ms ± 2.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
The single threaded call directly uses the BitGenerator.

.. code-block:: ipython
In [5]: values = np.empty(10000000)
...: rg = Generator(PCG64())
...: %timeit rg.standard_normal(out=values)
...: rg = default_rng()
...: %timeit rg.standard_normal(out=values)
99.6 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Out[5]: 99.6 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
The gains are substantial and the scaling is reasonable even for large that
are only moderately large. The gains are even larger when compared to a call
The gains are substantial and the scaling is reasonable even for arrays that
are only moderately large. The gains are even larger when compared to a call
that does not use an existing array due to array creation overhead.

.. code-block:: ipython
In [6]: rg = Generator(PCG64())
...: %timeit rg.standard_normal(10000000)
In [6]: rg = default_rng()
...: %timeit rg.standard_normal(10000000)
Out[6]: 125 ms ± 309 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Note that if `threads` is not set by the user, it will be determined by
`multiprocessing.cpu_count()`.

.. code-block:: ipython
125 ms ± 309 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [7]: # simulate the behavior for `threads=None`, if the machine had only one thread
...: mrng = MultithreadedRNG(10000000, seed=12345, threads=1)
...: print(mrng.values[-1])
Out[7]: 1.1800150052158556

0 comments on commit d5a444e

Please sign in to comment.