`sequence` not modifying device array of `cuComplex` #1983

TysonRayJones · 2023-08-24T12:19:54Z

Hi there,

The goal

I'm using Thrust 2.1.1 with CUDA 12.2 on Ubuntu 20, deploying to a GPU with compute-capability 7.5.
My goal is to initialise a device array of cuDoubleComplex (provided by cuComplex.h), using Thrust's sequence() which accepts a device_ptr (and offset) and init and step scalars. I must use a cudaMalloc'd raw pointer (rather than a Thrust device_vector), and I must use cuDoubleComplex for the scalar type (as opposed to thrust::complex<double>) due to compatibility constraints.

The correctly initialised device array will have contents:

0 + i(0.1)
0.2 + i(0.3)
0.4 + i(0.5)
0.6 + i(0.7)
0.8 + i(0.9)
...

That is, we start at .1i and step by .2+.2i.

The problem

Attempting to do so with cuComplex directly...

// create a cuComplex device array
int size = (1 << 5);
cuDoubleComplex *d_sv;
cudaMalloc((void**)&d_sv, size * sizeof *d_sv);

// give thrust::sequence cuComplex scalars
cuDoubleComplex init = {0, .1};
cuDoubleComplex step = {.2, .2};
thrust::device_ptr<cuDoubleComplex> ptr = thrust::device_pointer_cast(d_sv);
thrust::sequence(ptr, ptr + size, init, step);

will not compile, since sequence needs a cuDoubleComplex * size_t operator:

sequence.inl(67): error: no operator "*" matches these operands
            operand types are: const cuDoubleComplex * std::size_t
      return init + step * i;

If we define the needed operators...

cuDoubleComplex operator * (const cuDoubleComplex& a, const std::size_t n) {
    return make_cuDoubleComplex(n*cuCreal(a), n*cuCimag(a));
}
cuDoubleComplex operator + (const cuDoubleComplex& a, const cuDoubleComplex& b) {
    return make_cuDoubleComplex(cuCreal(a) + cuCreal(b), cuCimag(a) + cuCimag(b));
}

and compile, this code runs but produces an incorrect (unmodified) all-zero device array. I check the resulting device array via:

// copy device array to RAM and print
cudaDeviceSynchronize();
cuDoubleComplex* h_sv = (cuDoubleComplex*) malloc(size * sizeof *h_sv);
cudaMemcpy(h_sv, d_sv, size * sizeof *h_sv, cudaMemcpyDeviceToHost);
for (int i=0; i<size; i++)
    printf("%d: %g + i(%g)\n", i, cuCreal(h_sv[i]), cuCimag(h_sv[i]));

This is not an issue with our operator definitions, since replacing their returns with debug tests

cuDoubleComplex operator * (const cuDoubleComplex& a, const std::size_t n) {
    //return make_cuDoubleComplex(n*cuCreal(a), n*cuCimag(a));
    //return make_cuDoubleComplex(.1, .1);
    return {.1, .1}; // still na-da!
}

yields the same erroneous result.

Works fine with `thrust::complex`, but not allowed

I have checked that sequence works fine when given thrust::complex<double> as its type instead:

// create a thrust::complex<double> device array
int size = (1 << 5);
thrust::complex<double> *d_sv;
cudaMalloc((void**)&d_sv, size * sizeof *d_sv);

// give thrust::sequence thrust::complex scalars
thrust::complex<double> init = {0, .1};
thrust::complex<double> step = {.2, .2};
thrust::device_ptr<thrust::complex<double>> ptr = thrust::device_pointer_cast(d_sv);
thrust::sequence(ptr, ptr + size, init, step);

// check output
thrust::complex<double>* h_sv = (thrust::complex<double>*) malloc(size * sizeof *h_sv);
cudaMemcpy(h_sv, d_sv, size * sizeof *h_sv, cudaMemcpyDeviceToHost);
for (int i=0; i<size; i++)
    printf("%d: %g + i(%g)\n", i, h_sv[i].real(), h_sv[i].imag());

The output then is correct.
Unfortunately this is not a solution for me, since I am bound to maintaining the device array d_sv with type cuDoubleComplex.

Unable to hybridise `cuComplex` and `thrust::complex`

A precursory google suggested Thrust is compatible with cuComplex types (despite my results above), so I attempted to hybridise the two scenarios above: use cuDoubleComplex for the device array, and thrust::complex for the sequence arguments, like so:

using arrType = cuDoubleComplex;
using seqType = thrust::complex<double>;

// create a cuComplex device array
int size = (1 << 5);
arrType *d_sv;
cudaMalloc((void**)&d_sv, size * sizeof *d_sv);

// give thrust::sequence thrust::complex scalars
seqType init = {0, .1};
seqType step = {.2, .2};
thrust::device_ptr<arrType> ptr = thrust::device_pointer_cast(d_sv);
thrust::sequence(ptr, ptr + size, init, step);

Alas, this throws compiler error:

tabulate.h(55): error: no operator "=" matches these operands
            operand types are: thrust::device_reference<arrType> = seqType
        items[idx] = op(idx);

Am I making an obvious mistake?
Is there any way to use sequence for my purpose without changing the type of d_sv away from cuDoubleComplex?

Kernel is trivial, of course

I am aware just writing an un-optimised custom kernel for this case is trivial, like so:

__global__ void set_comp_seq_kernel(cuDoubleComplex* vec, int size){
    int index = blockIdx.x*blockDim.x + threadIdx.x;
    if (index>=size) return;
    vec[index] = make_cuDoubleComplex(.2*index, .2*index+.1);
}

int threadsPerCUDABlock = 128;
int CUDABlocks = ceil(size / (double) threadsPerCUDABlock);
set_comp_seq_kernel<<<CUDABlocks, threadsPerCUDABlock>>>(d_sv, size);

however I need to perform more advanced operations with Thrust and cuDoubleComplex arrays; this is the simplest MWE I could make.

Full MWEs

working `thrust::complex` code:

/*
 *  WORKS
 */

#include <stdio.h>
#include <thrust/device_ptr.h>
#include <thrust/sequence.h>
#include <thrust/complex.h>

using comp = thrust::complex<double>;


int main(void) {

    int size = (1 << 5);
    comp *d_sv;
    cudaMalloc((void**)&d_sv, size * sizeof *d_sv);

    comp init = {0, .1};
    comp step = {.2, .2};
    thrust::device_ptr<comp> ptr = thrust::device_pointer_cast(d_sv);
    thrust::sequence(ptr, ptr + size, init, step);
    
    comp* h_sv = (comp*) malloc(size * sizeof *h_sv);
    cudaMemcpy(h_sv, d_sv, size * sizeof *h_sv, cudaMemcpyDeviceToHost);
    for (int i=0; i<size; i++)
        printf("%d: %g + i(%g)\n", i, h_sv[i].real(), h_sv[i].imag());
    
    free(h_sv);
    cudaFree(d_sv);
    return 0;
}

failing `cuComplex` code:

/*
 *  FAILS
 */

#include <stdio.h>
#include <thrust/device_ptr.h>
#include <thrust/sequence.h>
#include <cuComplex.h>

using comp = cuDoubleComplex;


comp operator * (const comp& a, const std::size_t n) {
    return make_cuDoubleComplex(n*cuCreal(a), n*cuCimag(a));
}
comp operator + (const comp& a, const comp& b) {
    return make_cuDoubleComplex(cuCreal(a) + cuCreal(b), cuCimag(a) + cuCimag(b));
}


int main(void) {

    int size = (1 << 5);
    comp *d_sv;
    cudaMalloc((void**)&d_sv, size * sizeof *d_sv);

    comp init = {0, .1};
    comp step = {.2, .2};
    thrust::device_ptr<comp> ptr = thrust::device_pointer_cast(d_sv);
    thrust::sequence(ptr, ptr + size, init, step);
    
    comp* h_sv = (comp*) malloc(size * sizeof *h_sv);
    cudaMemcpy(h_sv, d_sv, size * sizeof *h_sv, cudaMemcpyDeviceToHost);
    for (int i=0; i<size; i++)
        printf("%d: %g + i(%g)\n", i, cuCreal(h_sv[i]), cuCimag(h_sv[i]));
    
    free(h_sv);
    cudaFree(d_sv);
    return 0;
}

The text was updated successfully, but these errors were encountered:

TysonRayJones · 2023-08-24T13:03:34Z

Woops, I had a brainwave and solved it! My cuDoubleComplex operator overloads were implicitly host-only. Changing them to include __device__, i.e.

__device__ inline comp operator * (const comp& a, const std::size_t n) {
    return make_cuDoubleComplex(n*cuCreal(a), n*cuCimag(a));
}
__device__ inline comp operator + (const comp& a, const comp& b) {
    return make_cuDoubleComplex(cuCreal(a) + cuCreal(b), cuCimag(a) + cuCimag(b));
}

solved the issue, and makes the failing cuComplex code above run fine.

github-project-automation bot added this to CCCL Aug 24, 2023

github-project-automation bot moved this to Todo in CCCL Aug 24, 2023

TysonRayJones closed this as completed Aug 24, 2023

github-project-automation bot moved this from Todo to Done in CCCL Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`sequence` not modifying device array of `cuComplex` #1983

`sequence` not modifying device array of `cuComplex` #1983

TysonRayJones commented Aug 24, 2023 •

edited

Loading

TysonRayJones commented Aug 24, 2023

sequence not modifying device array of cuComplex #1983

sequence not modifying device array of cuComplex #1983

Comments

TysonRayJones commented Aug 24, 2023 • edited Loading

The goal

The problem

Works fine with thrust::complex, but not allowed

Unable to hybridise cuComplex and thrust::complex

Kernel is trivial, of course

Full MWEs

working thrust::complex code:

failing cuComplex code:

TysonRayJones commented Aug 24, 2023

`sequence` not modifying device array of `cuComplex` #1983

`sequence` not modifying device array of `cuComplex` #1983

TysonRayJones commented Aug 24, 2023 •

edited

Loading

Works fine with `thrust::complex`, but not allowed

Unable to hybridise `cuComplex` and `thrust::complex`

working `thrust::complex` code:

failing `cuComplex` code: