Improve CUDA kernels #1128

simonbyrne · 2023-02-23T22:24:29Z

Using fortran style 1D indexing on the parent, with any required assertions done upstream, might be easiest for some kernels. E.g.:

function Base.copyto!(
    dest::IJFH{S, Nij},
    bc::Union{IJFH{S, Nij, A}, Base.Broadcast.Broadcasted{IJFHStyle{Nij, A}}},
) where {S, Nij, A <: CUDA.CuArray}
    nitems = length(parent(dest))
    max_threads = 256 # can be higher if conditions permit
    nthreads = min(max_threads, nitems)
    nblocks = cld(nitems, nthreads)
    pdest, pbc = parent(dest), parent(bc)
    CUDA.@cuda threads = (nthreads) blocks = (nblocks) knl_copyto!(pdest, pbc)
    return dest
end

function knl_copyto!(dest, src)
    nitems = length(dest)
    gidx = threadIdx().x + (blockIdx().x - 1) * blockDim().x

    if gidx < nitems
        @inbounds dest[gidx] = p_src[gidx]
    end
    return nothing
end

Originally posted by @sriharshakandala in #767 (comment)

sriharshakandala · 2024-02-26T18:47:10Z

We can try

cartidx = CartesianIndices(dest)[gidx]

sriharshakandala self-assigned this Apr 4, 2023

sriharshakandala linked a pull request Apr 4, 2023 that will close this issue

Add copyto! CUDA kernels #1174

Closed

4 tasks

charleskawczynski mentioned this issue Feb 24, 2024

Performance roadmap CliMA/ClimaAtmos.jl#2632

Open

charleskawczynski mentioned this issue Feb 26, 2024

Try ClimaCore diag specialization CliMA/ClimaAtmos.jl#2722

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve CUDA kernels #1128

Improve CUDA kernels #1128

simonbyrne commented Feb 23, 2023 •

edited by charleskawczynski

Loading

sriharshakandala commented Feb 26, 2024 •

edited

Loading

Improve CUDA kernels #1128

Improve CUDA kernels #1128

Comments

simonbyrne commented Feb 23, 2023 • edited by charleskawczynski Loading

sriharshakandala commented Feb 26, 2024 • edited Loading

simonbyrne commented Feb 23, 2023 •

edited by charleskawczynski

Loading

sriharshakandala commented Feb 26, 2024 •

edited

Loading