Add generic device and initial support in portBLAS #566

s-Nick · 2024-09-05T09:00:20Z

Description

oneMKL Interfaces currently only supports known targets: Intel CPU/GPU, AMD GPU, Nvidia GPU
This PR:
- Enables a new generic target
- Enables the generic target to use the portBLAS backend
- Adds documentation
- Relates to When trying to use oneMKL with the portBLAS backend there is a check for Intel, AMD or Nvidia GPU. #542

Checklist

All Submissions

Do all unit tests pass locally? Attach a log.
- I don't have a non Intel/NVIDIA/AMD device to test with. Tests are successful when forcing Intel devices to go through generic_device path.

Following logs show that all domains still work properly:
arc_mkl_BLAS_log.txt
arc_mkl_DFT_log.txt
arc_mkl_lapack_log.txt
arc_mkl_rng_log.txt

Have you formatted the code using clang-format?

New features

Have you provided motivation for adding a new feature?
Have you added relevant tests?
The generic device doesn't requires new tests.

* oneMKL Interfaces currently only supports known targets: Intel CPU/GPU, AMD GPU, Nvidia GPU * This PR: * Enables a new generic target * Enables the generic target to use the portBLAS backend * Adds documentation

possible devices. This commit remove the option ENABLE_GENERIC_DEVICE and instead add generic_device to the backends_table. The check for unsupported_device exception is moved to table_initializer and to keep it as informative as it is, it is required a change to the function_tables operator[]. This change allows to use portBLAS (and in a possible future) all "port" libraries with any device supported.

This patch enables the possibility to run tests with generic_device for devices that have an OpenCL backend.

src/include/function_table_initializer.hpp

Moved pragma and simplified if-statement to increase code readability

Rbiessy

That looks good to me, thanks!

s-Nick · 2024-09-06T12:44:31Z

@oneapi-src/onemkl-maintain Hi all, in this PR I am adding a new generic device to allow portable libraries such as portBLAS and portFFT to be compiled and run on devices that are not known to us right now, but that have an OpenCL backend available.

Changes are few and simple, but since I am making an important one to table_initializer and it touches all domains, I would like to have your approval on it. These changes don't cause any issue to current backends as you can check from logs attached.

andrewtbarker

This is a useful change, I have just a couple questions.

andrewtbarker · 2024-09-06T15:25:53Z

tests/unit_tests/main_test.cpp

-                                            "OpenCL") != std::string::npos)
+                    unsigned int vendor_id =
+                        static_cast<unsigned int>(dev.get_info<sycl::info::device::vendor_id>());
+                    /* Do not test for OpenCL backend on Intel GPU */


Does this logic continue to do what we want on AMD and nVidia GPUs?

Thank you for your review @andrewtbarker
As far as I know NVIDIA and AMD GPUs don't use OpenCL backends, so I expect they continue to work as intended. In any case, I am happy to add a condition to check if the vendor is one of the currently known one and skip the OpenCL backend.

Does this logic continue to do what we want on AMD and nVidia GPUs?

What is the behavior that we want on AMD and Nvidia exactly? We've been wondering whether we should remove this check entirely since it's not clear why would we want to skip OpenCL backends. To me users should use ONEAPI_DEVICE_SELECTOR instead.

andrewtbarker · 2024-09-06T15:32:42Z

src/include/function_table_initializer.hpp

@@ -59,14 +59,20 @@ class table_initializer {
    using dlhandle = std::unique_ptr<LIB_TYPE, handle_deleter>;

 public:
-    function_table_t &operator[](oneapi::mkl::device key) {
-        auto lib = tables.find(key);
+    function_table_t &operator[](std::pair<oneapi::mkl::device, sycl::queue> device_queue_pair) {


It's not clear to me why this change is necessary. Where do we need the sycl::queue in the lookup tables?

Since I moved the unsupported_device exception in the table initializer, I need the queue to query the device name. This is the only reason why I need the sycl::queue here.

That makes sense. It seems like a large change for just the exception - I would support reverting the function table and throwing a more generic exception (or maybe implement mkl::unsupported_device with no device argument?). But it is also okay as is.

Previously if some specific backend where enabled the test suite always added a cpu to the device to run test on, even if another if condition already added them. This behaviour cause linking time issue if a cpu device is not available. This commit removes it and it adds missing pragma to the other device selection, fixing the linking issue.

al3x-jp · 2024-09-16T10:47:09Z

Hi, I got the following error when trying to build this checkout:

[ 51%] Linking CXX executable ../../bin/test_main_blas_ct
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): No device of requested type 'info::device_type::cpu' available. -1 (PI_ERROR_DEVICE_NOT_FOUND)
CMake Error at /usr/share/cmake-3.22/Modules/GoogleTestAddTests.cmake:83 (message):
Error running test executable.

Path: '/<path>/oneMKL/build/bin/test_main_blas_ct'
Result: Subprocess aborted
Output:

Call Stack (most recent call first):
/usr/share/cmake-3.22/Modules/GoogleTestAddTests.cmake:179 (gtest_discover_tests_impl)

gmake[2]: *** [tests/unit_tests/CMakeFiles/test_main_blas_ct.dir/build.make:388: bin/test_main_blas_ct] Error 1

Does the build require a SYCL CPU target being available?
Thanks

s-Nick · 2024-09-16T13:53:06Z

Hi @al3x-jp, thank you for highlighting this issue here.
We were able to reproduce the same linking error using the develop branch and an open source DPC++ build.
In any case, commit 29e94fc addresses what we think caused it and on our machine and everything works fine. Could you please test again this PR including this last commit? Your feedback would be very helpful for us.

docs/building_the_project_with_dpcpp.rst

tests/unit_tests/main_test.cpp

Co-authored-by: Maria Kraynyuk <maria.kraynyuk@intel.com>

lhuot

LGTM, thanks!

hjabird and others added 6 commits August 30, 2024 14:45

Add generic device; Initial support in portBLAS

c06a606

* oneMKL Interfaces currently only supports known targets: Intel CPU/GPU, AMD GPU, Nvidia GPU * This PR: * Enables a new generic target * Enables the generic target to use the portBLAS backend * Adds documentation

CMake typo; Add -fno-sycl-instrument-device-code

06c278c

Enable test for OpenCL GPUs except for Intel's one

d21ac7b

This patch enables the possibility to run tests with generic_device for devices that have an OpenCL backend.

Add pragma to guard generic device usage and exception

ee6569c

Fix typo

922a654

s-Nick requested a review from Rbiessy September 5, 2024 09:00

Rbiessy reviewed Sep 5, 2024

View reviewed changes

src/include/function_table_initializer.hpp Outdated Show resolved Hide resolved

Rbiessy mentioned this pull request Sep 5, 2024

Add generic device; Initial support in portBLAS #552

Closed

2 tasks

s-Nick added 2 commits September 6, 2024 09:54

Move generic_device support pragma

3466d6f

Moved pragma and simplified if-statement to increase code readability

Fix broken if-statement

5f743b2

Rbiessy approved these changes Sep 6, 2024

View reviewed changes

s-Nick marked this pull request as ready for review September 6, 2024 12:35

Rbiessy requested a review from a team September 6, 2024 12:55

andrewtbarker reviewed Sep 6, 2024

View reviewed changes

s-Nick mentioned this pull request Sep 9, 2024

Add generic device support to dft domain through portFFT #570

Merged

4 tasks

andrewtbarker approved these changes Sep 9, 2024

View reviewed changes

mkrainiuk reviewed Sep 17, 2024

View reviewed changes

docs/building_the_project_with_dpcpp.rst Outdated Show resolved Hide resolved

tests/unit_tests/main_test.cpp Show resolved Hide resolved

Update docs/building_the_project_with_dpcpp.rst

70f8835

Co-authored-by: Maria Kraynyuk <maria.kraynyuk@intel.com>

mkrainiuk approved these changes Sep 24, 2024

View reviewed changes

lhuot approved these changes Sep 25, 2024

View reviewed changes

s-Nick merged commit afb9d5c into oneapi-src:develop Sep 30, 2024
8 checks passed

Rbiessy mentioned this pull request Oct 2, 2024

When trying to use oneMKL with the portBLAS backend there is a check for Intel, AMD or Nvidia GPU. #542

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add generic device and initial support in portBLAS #566

Add generic device and initial support in portBLAS #566

s-Nick commented Sep 5, 2024

Rbiessy left a comment

s-Nick commented Sep 6, 2024

andrewtbarker left a comment

andrewtbarker Sep 6, 2024

s-Nick Sep 9, 2024

Rbiessy Sep 9, 2024

andrewtbarker Sep 6, 2024

s-Nick Sep 9, 2024

andrewtbarker Sep 9, 2024

al3x-jp commented Sep 16, 2024

s-Nick commented Sep 16, 2024

lhuot left a comment

Add generic device and initial support in portBLAS #566

Add generic device and initial support in portBLAS #566

Conversation

s-Nick commented Sep 5, 2024

Description

Checklist

All Submissions

New features

Rbiessy left a comment

Choose a reason for hiding this comment

s-Nick commented Sep 6, 2024

andrewtbarker left a comment

Choose a reason for hiding this comment

andrewtbarker Sep 6, 2024

Choose a reason for hiding this comment

s-Nick Sep 9, 2024

Choose a reason for hiding this comment

Rbiessy Sep 9, 2024

Choose a reason for hiding this comment

andrewtbarker Sep 6, 2024

Choose a reason for hiding this comment

s-Nick Sep 9, 2024

Choose a reason for hiding this comment

andrewtbarker Sep 9, 2024

Choose a reason for hiding this comment

al3x-jp commented Sep 16, 2024

s-Nick commented Sep 16, 2024

lhuot left a comment

Choose a reason for hiding this comment