Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: stats: allow bootstrap to use CuPy #63

Closed
wants to merge 2 commits into from
Closed

Conversation

mdhaber
Copy link
Owner

@mdhaber mdhaber commented Sep 7, 2021

Experimental use of CuPy in scipy.special.bootstrap

@rgommers
Copy link

This is an interesting experiment. I believe we will not want to use an explicit import of CuPy within SciPy, rather we will want to use the array API standard and __array_namespace__. But that's not hard to change in the near future (once support in CuPy and NumPy is complete). The interesting part is that there is a large benefit.

@mdhaber
Copy link
Owner Author

mdhaber commented Dec 6, 2021

I'm seeing ~10x speedup on this. (Update: 50x using rand and casting instead of randint)

import numpy as np
import cupy as cp
from scipy import stats

data_np = np.random.rand(10000)
rng_np = np.random.RandomState(0)
res_np = stats.bootstrap((data_np,), np.std, batch=1000,
                         random_state=rng_np, xp=np)  # 2.8 s ± 9.26 ms per loop 

data_cp = cp.array(data_np)
rng_cp = cp.random.RandomState(0)
res_cp = stats.bootstrap((data_cp,), cp.std, batch=1000, 
                         random_state=rng_cp, xp=cp)  # 240 ms ± 518 µs per loop  
  • Computing the statistic is ~90x faster, which is more in line with what I was expecting (~1% of the total time now)
  • Generating the resamples is only ~4x faster, which is the bottleneck (~80% of the total time). Looking deeper, 97.2% of that time is the random number generation; only 2.8% is the indexing. This is a known issue: cupy.random.randint is slow cupy/cupy#4120. Using xp.rand, multiplying is much faster, bringing the total execution time from 240ms to 53ms.
  • In this case, calculating the BCa interval is ~10x faster, but that's (~18% of the total time). Again, actually computing the statistic is very fast. There is some overhead in using the GPU for the tiny calculations, like calculating the statistic for the observed data (only) and the use of cupy.scipy.special.ndtr. The slow part is again generating the resamples - specifically a reshape operation in _jackknife_resample. I could probably find a more efficient way to generate the jacknife resamples on the GPU.

No need to leave this PR open. Concept has been demonstrated. Actual implementation will depend on how SciPy decides to handle other array backends in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants