Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

sequence not modifying device array of cuComplex #1983

Closed
TysonRayJones opened this issue Aug 24, 2023 · 1 comment
Closed

sequence not modifying device array of cuComplex #1983

TysonRayJones opened this issue Aug 24, 2023 · 1 comment

Comments

@TysonRayJones
Copy link

TysonRayJones commented Aug 24, 2023

Hi there,

The goal

I'm using Thrust 2.1.1 with CUDA 12.2 on Ubuntu 20, deploying to a GPU with compute-capability 7.5.
My goal is to initialise a device array of cuDoubleComplex (provided by cuComplex.h), using Thrust's sequence() which accepts a device_ptr (and offset) and init and step scalars. I must use a cudaMalloc'd raw pointer (rather than a Thrust device_vector), and I must use cuDoubleComplex for the scalar type (as opposed to thrust::complex<double>) due to compatibility constraints.

The correctly initialised device array will have contents:

0 + i(0.1)
0.2 + i(0.3)
0.4 + i(0.5)
0.6 + i(0.7)
0.8 + i(0.9)
...

That is, we start at .1i and step by .2+.2i.

The problem

Attempting to do so with cuComplex directly...

// create a cuComplex device array
int size = (1 << 5);
cuDoubleComplex *d_sv;
cudaMalloc((void**)&d_sv, size * sizeof *d_sv);

// give thrust::sequence cuComplex scalars
cuDoubleComplex init = {0, .1};
cuDoubleComplex step = {.2, .2};
thrust::device_ptr<cuDoubleComplex> ptr = thrust::device_pointer_cast(d_sv);
thrust::sequence(ptr, ptr + size, init, step);

will not compile, since sequence needs a cuDoubleComplex * size_t operator:

sequence.inl(67): error: no operator "*" matches these operands
            operand types are: const cuDoubleComplex * std::size_t
      return init + step * i;

If we define the needed operators...

cuDoubleComplex operator * (const cuDoubleComplex& a, const std::size_t n) {
    return make_cuDoubleComplex(n*cuCreal(a), n*cuCimag(a));
}
cuDoubleComplex operator + (const cuDoubleComplex& a, const cuDoubleComplex& b) {
    return make_cuDoubleComplex(cuCreal(a) + cuCreal(b), cuCimag(a) + cuCimag(b));
}

and compile, this code runs but produces an incorrect (unmodified) all-zero device array. I check the resulting device array via:

// copy device array to RAM and print
cudaDeviceSynchronize();
cuDoubleComplex* h_sv = (cuDoubleComplex*) malloc(size * sizeof *h_sv);
cudaMemcpy(h_sv, d_sv, size * sizeof *h_sv, cudaMemcpyDeviceToHost);
for (int i=0; i<size; i++)
    printf("%d: %g + i(%g)\n", i, cuCreal(h_sv[i]), cuCimag(h_sv[i]));

This is not an issue with our operator definitions, since replacing their returns with debug tests

cuDoubleComplex operator * (const cuDoubleComplex& a, const std::size_t n) {
    //return make_cuDoubleComplex(n*cuCreal(a), n*cuCimag(a));
    //return make_cuDoubleComplex(.1, .1);
    return {.1, .1}; // still na-da!
}

yields the same erroneous result.

Works fine with thrust::complex, but not allowed

I have checked that sequence works fine when given thrust::complex<double> as its type instead:

// create a thrust::complex<double> device array
int size = (1 << 5);
thrust::complex<double> *d_sv;
cudaMalloc((void**)&d_sv, size * sizeof *d_sv);

// give thrust::sequence thrust::complex scalars
thrust::complex<double> init = {0, .1};
thrust::complex<double> step = {.2, .2};
thrust::device_ptr<thrust::complex<double>> ptr = thrust::device_pointer_cast(d_sv);
thrust::sequence(ptr, ptr + size, init, step);

// check output
thrust::complex<double>* h_sv = (thrust::complex<double>*) malloc(size * sizeof *h_sv);
cudaMemcpy(h_sv, d_sv, size * sizeof *h_sv, cudaMemcpyDeviceToHost);
for (int i=0; i<size; i++)
    printf("%d: %g + i(%g)\n", i, h_sv[i].real(), h_sv[i].imag());

The output then is correct.
Unfortunately this is not a solution for me, since I am bound to maintaining the device array d_sv with type cuDoubleComplex.

Unable to hybridise cuComplex and thrust::complex

A precursory google suggested Thrust is compatible with cuComplex types (despite my results above), so I attempted to hybridise the two scenarios above: use cuDoubleComplex for the device array, and thrust::complex for the sequence arguments, like so:

using arrType = cuDoubleComplex;
using seqType = thrust::complex<double>;

// create a cuComplex device array
int size = (1 << 5);
arrType *d_sv;
cudaMalloc((void**)&d_sv, size * sizeof *d_sv);

// give thrust::sequence thrust::complex scalars
seqType init = {0, .1};
seqType step = {.2, .2};
thrust::device_ptr<arrType> ptr = thrust::device_pointer_cast(d_sv);
thrust::sequence(ptr, ptr + size, init, step);

Alas, this throws compiler error:

tabulate.h(55): error: no operator "=" matches these operands
            operand types are: thrust::device_reference<arrType> = seqType
        items[idx] = op(idx);

Am I making an obvious mistake?
Is there any way to use sequence for my purpose without changing the type of d_sv away from cuDoubleComplex?

Kernel is trivial, of course

I am aware just writing an un-optimised custom kernel for this case is trivial, like so:

__global__ void set_comp_seq_kernel(cuDoubleComplex* vec, int size){
    int index = blockIdx.x*blockDim.x + threadIdx.x;
    if (index>=size) return;
    vec[index] = make_cuDoubleComplex(.2*index, .2*index+.1);
}

int threadsPerCUDABlock = 128;
int CUDABlocks = ceil(size / (double) threadsPerCUDABlock);
set_comp_seq_kernel<<<CUDABlocks, threadsPerCUDABlock>>>(d_sv, size);

however I need to perform more advanced operations with Thrust and cuDoubleComplex arrays; this is the simplest MWE I could make.


Full MWEs

working thrust::complex code:

/*
 *  WORKS
 */

#include <stdio.h>
#include <thrust/device_ptr.h>
#include <thrust/sequence.h>
#include <thrust/complex.h>

using comp = thrust::complex<double>;


int main(void) {

    int size = (1 << 5);
    comp *d_sv;
    cudaMalloc((void**)&d_sv, size * sizeof *d_sv);

    comp init = {0, .1};
    comp step = {.2, .2};
    thrust::device_ptr<comp> ptr = thrust::device_pointer_cast(d_sv);
    thrust::sequence(ptr, ptr + size, init, step);
    
    comp* h_sv = (comp*) malloc(size * sizeof *h_sv);
    cudaMemcpy(h_sv, d_sv, size * sizeof *h_sv, cudaMemcpyDeviceToHost);
    for (int i=0; i<size; i++)
        printf("%d: %g + i(%g)\n", i, h_sv[i].real(), h_sv[i].imag());
    
    free(h_sv);
    cudaFree(d_sv);
    return 0;
}

failing cuComplex code:

/*
 *  FAILS
 */

#include <stdio.h>
#include <thrust/device_ptr.h>
#include <thrust/sequence.h>
#include <cuComplex.h>

using comp = cuDoubleComplex;


comp operator * (const comp& a, const std::size_t n) {
    return make_cuDoubleComplex(n*cuCreal(a), n*cuCimag(a));
}
comp operator + (const comp& a, const comp& b) {
    return make_cuDoubleComplex(cuCreal(a) + cuCreal(b), cuCimag(a) + cuCimag(b));
}


int main(void) {

    int size = (1 << 5);
    comp *d_sv;
    cudaMalloc((void**)&d_sv, size * sizeof *d_sv);

    comp init = {0, .1};
    comp step = {.2, .2};
    thrust::device_ptr<comp> ptr = thrust::device_pointer_cast(d_sv);
    thrust::sequence(ptr, ptr + size, init, step);
    
    comp* h_sv = (comp*) malloc(size * sizeof *h_sv);
    cudaMemcpy(h_sv, d_sv, size * sizeof *h_sv, cudaMemcpyDeviceToHost);
    for (int i=0; i<size; i++)
        printf("%d: %g + i(%g)\n", i, cuCreal(h_sv[i]), cuCimag(h_sv[i]));
    
    free(h_sv);
    cudaFree(d_sv);
    return 0;
}
@TysonRayJones
Copy link
Author

Woops, I had a brainwave and solved it! My cuDoubleComplex operator overloads were implicitly host-only. Changing them to include __device__, i.e.

__device__ inline comp operator * (const comp& a, const std::size_t n) {
    return make_cuDoubleComplex(n*cuCreal(a), n*cuCimag(a));
}
__device__ inline comp operator + (const comp& a, const comp& b) {
    return make_cuDoubleComplex(cuCreal(a) + cuCreal(b), cuCimag(a) + cuCimag(b));
}

solved the issue, and makes the failing cuComplex code above run fine.

@github-project-automation github-project-automation bot moved this from Todo to Done in CCCL Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

1 participant