parallel_for equivalent in Ginkgo #1190

Timo-Schrader · 2022-11-07T16:55:39Z

Timo-Schrader
Nov 7, 2022

Hello,

we have the following (mathematical) problem:

We want to setup a RBF system matrix in order to solve for the coefficients. We furthermore want to accelerate this using CUDA (Gingko) by not only solving the linear system of equations, but also setting up the kernel matrix on the GPU.

Now our question is if there is a way in Ginkgo (similar to parallel_for in Kokkos) to fill such a matrix on the GPU, i.e., evaluate each RBF entry on the GPU side without having to write custom CUDA kernels ourselves and pass an array pointer to it. It might be the case that we just didn't find this feature yet, so please excuse me if this question is already answered.

Thank you and have a great day!

Best regards

Answered by upsj

Nov 7, 2022

We have such a framework in place already, it's only not yet exposed to the user since we are not certain it is final yet. For an example, see #938, or more specifically stencil_kernel.cpp. I'd be happy to pick this PR up again if it's of importance to you.

View full answer

upsj · 2022-11-07T17:13:12Z

upsj
Nov 7, 2022
Maintainer

We have such a framework in place already, it's only not yet exposed to the user since we are not certain it is final yet. For an example, see #938, or more specifically stencil_kernel.cpp. I'd be happy to pick this PR up again if it's of importance to you.

11 replies

davidscn Nov 7, 2022

Got it, thanks a lot for your quick replies!

Timo-Schrader Nov 19, 2022
Author

Hello again,

so I tried to implement a basic workflow in both Kokkos and Ginkgo and it seems to me that Ginkgo is much easier to use as a math library than Kokkos. I was also able to launch a self-written CUDA kernel with the help of Ginkgo using the experimental feature branch.

In summary, I really like this feature and would be delighted if it is possible to pick up this PR again (as suggested).
Thank you very much for your efforts!

upsj Nov 21, 2022
Maintainer

This is great to hear - we will discuss how to move this forward again. If you have a minute, could I get some input on #1209 from the user side? The duplication of arguments/parameters is currently necessary because we need to transform host objects to device representations, but I was wondering how much of an issue potential parameter order/name mismatches actually are.

Timo-Schrader Nov 22, 2022
Author

After getting used to it/understanding how it works, I think the way it currently works is absolutely feasible for the user as long as it is well documented since "all the magic" happens in run_kernel.
However, looking at #1209 , I am not 100% sure what row_vector(x) and default_stride refer to.

Thank you very much for your efforts, much appreciated!

Timo-Schrader Jan 13, 2023
Author

Good afternoon,

we just wanted to report that we are getting first very promising results using Ginkgo + CUDA in preCICE when we need to solve a very large RBF system for data mapping. So if you need some further input from our side to get this really great feature of kernel dispatches into the main branch of Ginkgo just let us know!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel_for equivalent in Ginkgo #1190

{{title}}

Replies: 1 comment 11 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

parallel_for equivalent in Ginkgo #1190

Timo-Schrader Nov 7, 2022

Replies: 1 comment · 11 replies

upsj Nov 7, 2022 Maintainer

davidscn Nov 7, 2022

Timo-Schrader Nov 19, 2022 Author

upsj Nov 21, 2022 Maintainer

Timo-Schrader Nov 22, 2022 Author

Timo-Schrader Jan 13, 2023 Author

Timo-Schrader
Nov 7, 2022

Replies: 1 comment 11 replies

upsj
Nov 7, 2022
Maintainer

Timo-Schrader Nov 19, 2022
Author

upsj Nov 21, 2022
Maintainer

Timo-Schrader Nov 22, 2022
Author

Timo-Schrader Jan 13, 2023
Author