-
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combining Cuba.jl with GPU #36
Comments
Alternatively, I can bind arrays as CuArrays after drawing the random variables, but the time cost of doing so seems prohibitive. Instead, it seems much faster to avoid using the GPU and use the CPU exclusively. Is there anyway to decrease the time fo the gpu option? I.e. could I draw random variables into the GPU initially rather than having to transfer them over? I think the transfering from the gpu to the cpu is what makes the code so slow for the gpu option. For example: import Pkg
Pkg.activate("joint_timing")
Pkg.instantiate()
using Cuba, Distributions
using BenchmarkTools, Test, CUDA
using FLoops, FoldsCUDA
using SpecialFunctions
@test Threads.nthreads()>1
# User Inputs
M= 5 # number of independent uniform random variables
atol=1e-6
rtol=1e-3
nvec=1000000
maxevals=100000000
# Initializing Functions
function int_cpu(x, f)
f[1] = pdf(Product(Beta.(1.0,2.0*ones(M))),x)
end
function int_cpu2(x, f)
f[1] = vec(prod(x'.^(1.0-1.0) .* (1.0 .- x').^(2.0-1.0)./(gamma(1.0)*gamma(2.0)/gamma(3.0)),dims=2))[1]
end
function beta_pdf_gpu(x, a, b)
prod(x.^(a-1.0f0) .* (1.0f0 .- x).^(b-1.0f0)./(gamma(a)*gamma(b)/gamma(a+b)),dims=1)
end
function int_gpu(x, f)
f[1] = vec(beta_pdf_gpu(CuArray(x),1.0f0,2.0f0))[1]
end
display(@benchmark cuhre($int_gpu, $M, 1, atol=$atol, rtol=$rtol)) # 70 ms for M = 5, 11.7 s for M = 15)
display(@benchmark cuhre($int_cpu, $M, 1, atol=$atol, rtol=$rtol)) # (2.0 ms for M = 5, 650ms for M=15)
display(@benchmark cuhre($int_cpu2, $M, 1, atol=$atol, rtol=$rtol)) # (500 mus for M = 5, 100ms for M = 15, 38s for M = 25) |
I'm not really sure what you want to do here. It may be interesting to see whether |
Thank you for the quick response! I will look into that. |
Hello,
I'm trying to utilize gpu computation using Cuda.jl to speed up calculating integrals. Is it possible to do so? If so, how? Here is my example code:
If I use the CUDAEx() call, the code errors. If I don't the code works fine, but isn't exploiting the GPU effectively.
If I include the CUDAEx() call, the error message is
The text was updated successfully, but these errors were encountered: