Skip to content

Comments

[rocfft][cufft] DFT update host task to use native command#578

Merged
Rbiessy merged 11 commits intouxlfoundation:developfrom
JackAKirk:cufft-update-host-task
Oct 14, 2024
Merged

[rocfft][cufft] DFT update host task to use native command#578
Rbiessy merged 11 commits intouxlfoundation:developfrom
JackAKirk:cufft-update-host-task

Conversation

@JackAKirk
Copy link
Contributor

@JackAKirk JackAKirk commented Oct 2, 2024

Description

Similar to #572 (see the discussions in that PR for technical details) except this covers fft backends for both amd and nvidia cases.

Update host task impl to use enqueue_native_command for DFT using the cuda/hip backends.

tests:

test_main_dft_ct_amd.txt
test_main_dft_rt_amd.txt
test_main_dft_rt_nvidia.txt
test_main_dft_ct_nvidia.txt

author: @hjabird

@JackAKirk
Copy link
Contributor Author

@oneapi-src/onemkl-dft-write could you please review this?

Thanks

Copy link
Contributor

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Note we have used this in 2 GROMACS benchmarks to get between 2% to 7% improvements on MI210.

remove whitespace

Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>
@JackAKirk
Copy link
Contributor Author

@anantsrivastava30 Is this OK to merge?

Thanks

Comment on lines 162 to 167
#ifdef SYCL_EXT_ONEAPI_ENQUEUE_NATIVE_COMMAND
cgh.ext_codeplay_enqueue_native_command([=](sycl::interop_handle ih){
#else
cgh.host_task([=](sycl::interop_handle ih){
#endif
f(std::move(ih));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it needed to duplicate this in both cuFFT and rocFFT backends and in various domains (BLAS, LAPACK, FFT)? Can't we have one wrapper used across all domains and backends?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the domains have always been separated on purpose. This makes review much easier as any change affecting common code would technically require an approval from every domain owners.
I agree this could be discussed in an issue. To my knowledge there is very little code that could be common across domains, other than the types and exceptions which are already common.

@JackAKirk
Copy link
Contributor Author

Hi @lhuot

I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends.

@lhuot
Copy link
Contributor

lhuot commented Oct 11, 2024

Hi @lhuot

I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends.

Then, let's fix the code duplication in the DFT domain please.

@JackAKirk
Copy link
Contributor Author

Hi @lhuot
I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends.

Then, let's fix the code duplication in the DFT domain please.

OK sure, do you want me to add a new header e.g. "execute_helper_generic.hpp" in oneMKL/src/dft/ with the fft_host_task implementation that is portable across hip and cuda backends, and then only include this header in the rocfft and cufft backends?
I can't include such a generic header in include/oneapi/mkl/detail/ because the enqueue_native_command extension is not available in intel backends and trying to link this header will break intel builds.

@lhuot
Copy link
Contributor

lhuot commented Oct 11, 2024

Hi @lhuot
I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends.

Then, let's fix the code duplication in the DFT domain please.

OK sure, do you want me to add a new header e.g. "execute_helper_generic.hpp" in oneMKL/src/dft/ with the fft_host_task implementation that is portable across hip and cuda backends, and then only include this header in the rocfft and cufft backends? I can't include such a generic header in include/oneapi/mkl/detail/ because the enqueue_native_command extension is not available in intel backends and trying to link this header will break intel builds.

Sounds like a reasonable approach to me. Thanks!

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
@JackAKirk
Copy link
Contributor Author

Hi @lhuot
I don't know, I am following the existing design. I think that it is a good question and it would be good to open this as an issue. However I don't think it is a good idea to block this PR for this, since this is a critical patch for Gromacs performance, which does not introduce this duplication that already exists in all other backends.

Then, let's fix the code duplication in the DFT domain please.

OK sure, do you want me to add a new header e.g. "execute_helper_generic.hpp" in oneMKL/src/dft/ with the fft_host_task implementation that is portable across hip and cuda backends, and then only include this header in the rocfft and cufft backends? I can't include such a generic header in include/oneapi/mkl/detail/ because the enqueue_native_command extension is not available in intel backends and trying to link this header will break intel builds.

Sounds like a reasonable approach to me. Thanks!

I've updated this now as requested. Tests still pass as posted in PR summary on both hip and cuda backends.

Copy link
Contributor

@lhuot lhuot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Rbiessy Rbiessy merged commit 058ee95 into uxlfoundation:develop Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants