Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An example using cuBLAS library in alpaka #2430

Conversation

mehmetyusufoglu
Copy link
Contributor

@mehmetyusufoglu mehmetyusufoglu commented Nov 22, 2024

This example uses cuBLAS library for matrix multiplication by using allocated alpaka buffers and alpaka queue. Another example is using rocBLAS library.

Cmake file is still to be changed for CI to not fail for other backends.

    auto alpakaStream = alpaka::getNativeHandle(queue);

    // cuBLAS setup from alpaka stream
    cublasHandle_t cublasHandle;
    cublasCreate(&cublasHandle);
    cublasSetStream(cublasHandle, alpakaStream);

    // Perform matrix multiplication: C = A * B
    float alpha = 1.0f, beta = 0.0f; // Set beta to 0.0f to overwrite C
    cublasSgemm(
        cublasHandle,
        CUBLAS_OP_N,
        CUBLAS_OP_N, // No transpose for A and B
        M,
        N,
        K, // Dimensions: C = A * B
        &alpha,
        alpaka::getPtrNative(bufDevA), M ...
    );
    alpaka::wait(queue); // Wait for multiplication to complete```

@mehmetyusufoglu mehmetyusufoglu marked this pull request as draft November 22, 2024 11:08
@mehmetyusufoglu mehmetyusufoglu changed the title [Wip] An example using BLAS library in alpaka An example using BLAS library in alpaka Nov 22, 2024
@mehmetyusufoglu mehmetyusufoglu force-pushed the exampleUsingCuBlas branch 2 times, most recently from ae8e4ed to a0dee21 Compare November 22, 2024 11:47
N,
K, // Dimensions: C = A * B
&alpha,
alpaka::getPtrNative(bufDevA),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better use std::data() instead of alpaka::getPtrNative()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done, thanks.

Idx const K = 3; // Columns in A and rows in B

// Define device and queue
using Acc = alpaka::AccGpuCudaRt<Dim1D, Idx>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use the CUDA tag and derive the ACC from the tag? THis will reduce the work as soon as we refactor the accelerators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the standard tags like other examples, but prevented the configuration of this example at cmake if ACC_CUDA_ONLY cmake variable is not set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway i used cuda tag as you suggested. This example could have a direct main rather than using ExampleTags since only will run with single backend.

using Acc = alpaka::TagToAcc<alpaka::TagGpuCudaRt, Dim1D, Idx>;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@psychocoderHPC I agree with @mehmetyusufoglu . It does not make sense to use the same template, like in the other examples. The code can be only used with the CUDA backend. Therefore we need no complicated iteration over the enabled tags.

@psychocoderHPC psychocoderHPC added this to the 2.0.0 milestone Nov 25, 2024
@mehmetyusufoglu mehmetyusufoglu force-pushed the exampleUsingCuBlas branch 3 times, most recently from 07acce5 to 2b3ef4f Compare November 26, 2024 14:53
@mehmetyusufoglu mehmetyusufoglu marked this pull request as ready for review November 26, 2024 14:53
@mehmetyusufoglu mehmetyusufoglu changed the title An example using BLAS library in alpaka An example using cuBLAS library in alpaka Nov 28, 2024
@mehmetyusufoglu mehmetyusufoglu force-pushed the exampleUsingCuBlas branch 2 times, most recently from 4f98208 to 3194bb3 Compare November 29, 2024 12:20
@mehmetyusufoglu
Copy link
Contributor Author

This PR is closed because the changes is added to #2433 since 2 PR's will share the same directory in examples directory. Rather than being 2 separate directories in examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants