You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I hit a CUDA error about "too many resources" and discovered it's because my kernel required a lot of registers. I found the following answer helpful, but it uses the deprecated CUDAnative package. The maxthreads on cufunction takes the number of registers needed by the kernel into account. Based on that example, here's what I came up with for JACC.parallel_for for single dimension:
function JACC.parallel_for(N::I, f::F, x...) where {I<:Integer,F<:Function}
parallel_args = (f, x...)
parallel_kargs =cudaconvert.(parallel_args)
parallel_tt = Tuple{Core.Typeof.(parallel_kargs)...}
parallel_kernel =cufunction(_parallel_for_cuda, parallel_tt)
maxPossibleThreads = CUDA.maxthreads(parallel_kernel)
threads =min(N, maxPossibleThreads)
blocks =ceil(Int, N / threads)
parallel_kernel(parallel_kargs...; threads=threads, blocks=blocks)
end
This works, although it probably needs more exploration.
The text was updated successfully, but these errors were encountered:
I hit a CUDA error about "too many resources" and discovered it's because my kernel required a lot of registers. I found the following answer helpful, but it uses the deprecated CUDAnative package. The maxthreads on cufunction takes the number of registers needed by the kernel into account. Based on that example, here's what I came up with for JACC.parallel_for for single dimension:
This works, although it probably needs more exploration.
The text was updated successfully, but these errors were encountered: