Unify and simplify batch functionality: Multivector #1651

pratikvn · 2024-07-23T11:41:31Z

This PR is the first part of the batched kernel refactoring. It removes the .inc style files, moves to .hpp, making the functions available in a common namespace. It also uses a single source for both HIP and CUDA.

TODO

Check if there are any issues with solver includes.
Remove existing redundant .inc files
Differences in __launch_bounds__ between CUDA and HIP needs to be resolved

MarcelKoch · 2024-08-16T09:18:48Z

If you haven't noted that already, there is a naming inconsistency of the single-batch matrix-apply kernels for the reference and cuda/hip backend.
In the reference backend those are called simple|advanced_apply_kernel, while for cuda/hip they are called simple|advanced_apply.
I think that should also be unified.

upsj

LGTM, I'm wondering if we can get rid of some of the GKO_DEVICE_NAMESPACE:: annotations though - is there an ambiguity issue when removing them?

pratikvn · 2024-08-19T13:33:51Z

@MarcelKoch , yes, I plan to unify those when refactoring the matrix format kernels.

yhmtsai · 2024-08-19T15:46:46Z

common/cuda_hip/base/batch_multi_vector_kernels.hpp

@@ -20,8 +58,7 @@ __device__ __forceinline__ void scale(


 template <typename ValueType, typename Mapping>
-__global__
-__launch_bounds__(default_block_size, sm_oversubscription) void scale_kernel(
+__global__ __launch_bounds__(default_block_size) void scale_kernel(


you have used 4 for sm_oversubscription on both cuda/hip.
I assume the cuda is the correct and hip just uses it.
if you want to compute in more accurate mapping, hip should use (min_blocks_multiprocessor (4) * max_threads_per_block (256) )/32 = 32 for hip.
you will need to distinguish it by macro

I havent fully benchmarked that yet. I agree that this will have to be a macro specialized for CUDA and HIP. But will done in a future PR. It has already been noted in #1376

pratikvn · 2024-08-20T12:04:22Z

format!

Co-authored-by: Pratik Nayak <pratikvn@pm.me>

pratikvn added type:batched-functionality This is related to the batched functionality in Ginkgo 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. labels Jul 23, 2024

pratikvn self-assigned this Jul 23, 2024

pratikvn marked this pull request as draft July 23, 2024 11:41

ginkgo-bot added reg:build This is related to the build system. mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. labels Jul 23, 2024

pratikvn force-pushed the batch-unif branch from 546492f to 6b6e8e2 Compare August 19, 2024 09:04

pratikvn marked this pull request as ready for review August 19, 2024 09:47

pratikvn changed the title ~~WIP: Unify and simplify batch functionality~~ Unify and simplify batch functionality: Multivector Aug 19, 2024

pratikvn added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. labels Aug 19, 2024

pratikvn requested review from a team August 19, 2024 09:48

upsj approved these changes Aug 19, 2024

View reviewed changes

yhmtsai reviewed Aug 19, 2024

View reviewed changes

pratikvn force-pushed the batch-unif branch from 43e788f to ad5f7cd Compare August 20, 2024 10:19

yhmtsai approved these changes Aug 20, 2024

View reviewed changes

pratikvn added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Aug 20, 2024

pratikvn added the 1:ST:no-changelog-entry Skip the wiki check for changelog update label Aug 20, 2024

pratikvn and others added 6 commits August 21, 2024 10:32

unify cuda/hip batch_mvec

c7336ca

[cuda,hip] update namespaces and includes

69fbc2c

[ref, omp] move kernels to headers

af240e0

[kernels] remove GKO_DEVICE_NAMESPACE

56f38c4

[dpcpp] move to proper headers

fa7c43f

[format] Format files

a4b4e22

Co-authored-by: Pratik Nayak <pratikvn@pm.me>

pratikvn force-pushed the batch-unif branch from f44ef50 to a4b4e22 Compare August 21, 2024 08:33

pratikvn merged commit 83a577c into develop Aug 21, 2024
12 of 14 checks passed

pratikvn deleted the batch-unif branch August 21, 2024 11:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify and simplify batch functionality: Multivector #1651

Unify and simplify batch functionality: Multivector #1651

pratikvn commented Jul 23, 2024 •

edited

Loading

MarcelKoch commented Aug 16, 2024

upsj left a comment

pratikvn commented Aug 19, 2024

yhmtsai Aug 19, 2024

pratikvn Aug 20, 2024

pratikvn commented Aug 20, 2024

Unify and simplify batch functionality: Multivector #1651

Unify and simplify batch functionality: Multivector #1651

Conversation

pratikvn commented Jul 23, 2024 • edited Loading

TODO

MarcelKoch commented Aug 16, 2024

upsj left a comment

Choose a reason for hiding this comment

pratikvn commented Aug 19, 2024

yhmtsai Aug 19, 2024

Choose a reason for hiding this comment

pratikvn Aug 20, 2024

Choose a reason for hiding this comment

pratikvn commented Aug 20, 2024

pratikvn commented Jul 23, 2024 •

edited

Loading