Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCCL Complex wrapper #747

Closed
nikopj opened this issue Jul 3, 2024 · 2 comments
Closed

NCCL Complex wrapper #747

nikopj opened this issue Jul 3, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@nikopj
Copy link

nikopj commented Jul 3, 2024

The NCCL backend in distributed utils does not support complex values see issue. Can we add a conveinent wrapper in the NCCLEXT to support broadcast, reduce, etc., likely via using reim and complex? I'm happy to get started on it but would like some feedback on a preferred location and so on.

@avik-pal
Copy link
Member

avik-pal commented Jul 3, 2024

There are 3 possible solutions:

  1. for the cases with LuxCUDADevice (
    function DistributedUtils.__bcast!(
    backend::NCCLBackend, sendrecvbuf, ::LuxCUDADevice; root=0)
    NCCL.Broadcast!(sendrecvbuf, backend.comm; root)
    return sendrecvbuf
    end
    ), do the reim and complex if the buffer type is ::AbstractArray{<:Complex}.
  2. Alternatively, you can detect complex numbers the same way as point 1 and forward the call to MPI similar to
    function DistributedUtils.__bcast!(
    backend::NCCLBackend, sendrecvbuf, dev::AbstractLuxDevice; root=0)
    return DistributedUtils.__bcast!(backend.mpi_backend, sendrecvbuf, dev; root)
    end
    . If MPI is CUDA-aware then no device to host copy is performed and MPI AFAIK supports Complex numbers.
  3. Add directly to NCCL.jl, though it might be worth opening an issue there and asking if the feature is welcome.

@avik-pal avik-pal added the enhancement New feature or request label Jul 3, 2024
@avik-pal
Copy link
Member

avik-pal commented Aug 2, 2024

Closing since this is going to directly happen in NCCL.jl

@avik-pal avik-pal closed this as completed Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants