Skip to content

Comments

Add rocBLAS backend#144

Merged
mmeterel merged 11 commits intouxlfoundation:developfrom
sbalint98:ustream-add-rocm
Feb 15, 2022
Merged

Add rocBLAS backend#144
mmeterel merged 11 commits intouxlfoundation:developfrom
sbalint98:ustream-add-rocm

Conversation

@sbalint98
Copy link
Contributor

@sbalint98 sbalint98 commented Nov 20, 2021

Description

As discussed in #99 with the newly added hipSYCL support oneMKL is able to target AMD GPUs. This PR adds the rocBLAS backend to the BLAS domain. The changes were based on the Cuda backend, and the set of supported BLAS functions matches the functionality of the Cuda backend.

Testing was carried out using rocm 4.5 and rocBLAS 4.5, and hipSYCL built from develop cc794e3, on a system with AMD MI Instinct 100 GPUs.

rocm_tests_out.log

  • Do all unit tests pass locally? Attach a log.
  • Have you formatted the code using clang-format?
  • Update Readme

@mmeterel
Copy link
Contributor

mmeterel commented Nov 22, 2021

@sbalint98 Thanks for the PR. Here are a few general comments before reviewing the PR in detail.

  1. Please make sure all files are formatted properly with clang-format and use the check mark.
    Have you formatted the code using clang-format?`

  2. Can we add the device info in the gtest/ctest report? You can look at other backends as an example.
    Start 9: BLAS/RT/Nrm2TestSuite/Nrm2Tests.ComplexSinglePrecision/Column_Major_
    9/3336 Test : BLAS/RT/Nrm2TestSuite/Nrm2Tests.ComplexSinglePrecision/Column_Major_ ......................................................... Passed 1.61 sec

  3. AFAIK, rocBLAS backend will be supported by only hipSYCL implementation. (Please correct me if I am wrong). This information needs to be added into README along with how to download rocBLAS, hipSYCL, their supported version numbers, build directions, etc.

Copy link
Contributor

@andrewtbarker andrewtbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking forward to seeing this capability added!

@sbalint98
Copy link
Contributor Author

Hi, thank you for the quick feedback @mmeterel and @andrewtbarker !

Please make sure all files are formatted properly with clang-format and use the check mark.
Have you formatted the code using clang-format?

Formatted, and checked.

Can we add the device info in the gtest/ctest report? You can look at other backends as an example.
Start 9: BLAS/RT/Nrm2TestSuite/Nrm2Tests.ComplexSinglePrecision/Column_Major_
9/3336 Test : BLAS/RT/Nrm2TestSuite/Nrm2Tests.ComplexSinglePrecision/Column_Major_ ......................................................... Passed 1.61 sec

This seems to be a rocm issue I opened ROCm/ROCm#1625 , hipSYCL just uses the rocm interface that does not seem to work on the test system.

AFAIK, rocBLAS backend will be supported by only hipSYCL implementation. (Please correct me if I am wrong).

Yes I agree

This information needs to be added into README along with how to download rocBLAS, hipSYCL, their supported version numbers, build directions, etc.

I'll look into this I will ping you in the next 1-2 days when I am finished with the changes to the README.

@mmeterel
Copy link
Contributor

@sbalint98 Could you please comment on the minimum ROCm version and the hardware support needed for ROCm support in general with hipSYCL? For example, do we need ROCm 4.5? Can we use Vega RX, or do we absolutely need MI100?

@sbalint98
Copy link
Contributor Author

do we need ROCm 4.5?

I tested this PR with rocm 4.5 exclusively but in my opinion, it should work with rocm+rocblas 4.2 and above. In terms of the version of rocBLAS the presence of the rocblas_status_check_numerics_fail type is necessary, which should be present since rocBLAS 4.2. In the case of building hipSYCL against rocm 4.2+, hipSYCL needs to be built against the LLVM compiler shipped with rocm.

Could you please comment on the minimum ROCm version and the hardware support needed for ROCm support in general with hipSYCL?

In terms of hipSYCL rocm backend hardware support, all cards supported by rocm are available under hipSYCL as well. see here . Pinging @illuhad for more details if needed.

Can we use Vega RX, or do we absolutely need MI100?

I think Vega cards are sufficient for hipSYCL. We are also using a FirePro card for some of our testings. Pinging @illuhad for more details

@vrpascuzzi
Copy link
Contributor

vrpascuzzi commented Nov 23, 2021

FWIW: hipSYCL/hipSYCL-rocm 0.9.0 (can't remember which version of ROCm was included) worked for me back in April. Vega RX and MI-50 were supported.

@illuhad
Copy link

illuhad commented Nov 23, 2021

In terms of hipSYCL rocm backend hardware support, all cards supported by rocm are available under hipSYCL as well. see here . Pinging @illuhad for more details if needed.

That's right, there's nothing special in hipSYCL that would restrict hardware support - it should "just work" on all devices supported by ROCm. Naturally, we can only test on a selection of GPUs due to not having all conceivable ROCm GPUs available. We mainly test on Radeon VII, Radeon Pro VII workstation cards and MI50 and MI100 data center cards.

The main limiting factor for oneMKL is the availability of some functionality in earlier rocBLAS versions as @sbalint98 points out.

@illuhad
Copy link

illuhad commented Nov 23, 2021

In the case of building hipSYCL against rocm 4.2+, hipSYCL needs to be built against the LLVM compiler shipped with rocm.

This is the case with some clang versions (I think clang 12), because not all changes required for ROCm 4.2+ support had reached upstream clang by that point. When building hipSYCL against newer clang versions, those might work without coming from the ROCm release.

@mmeterel
Copy link
Contributor

@sbalint98 @illuhad Thank you for your responses. One more question :) Is there a specific linux kernel we need?

@sbalint98
Copy link
Contributor Author

I think the answer is similar to the hardware support one. We are supposed to be supporting everything that rocm supports as well :https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html#prerequisites

@sbalint98
Copy link
Contributor Author

@illuhad Can you have a look at the changes to the README?

@sbalint98
Copy link
Contributor Author

@mmeterel test_helper.hpp includes oneapi/mkl.hpp that in turn includes types.hpp. However, let me update this solution. Defining half conflicts with rocrand.h since half is also defined there. I also noticed that in the rocblas backend I use half instead of sycl::half after fixing that there is no need for aliasing half.

@sbalint98
Copy link
Contributor Author

@mmeterel, I added the fix

Copy link
Contributor

@mmeterel mmeterel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR:
General comments:

  1. Please review the entire PR for copyright banner
  2. Please review the entire PR for std::vector usage instead of cl::sycl::vector_class
  3. Please review entire PR for wrong "CUDA" usage.
  4. Please use rocBLAS instead of rocblas as appropriate, except file names.
  5. Please add rocBLAS to the top table in README.
  6. Please provide test results for mklcpu and netlib backends.

@sbalint98
Copy link
Contributor Author

Results of the mklCPU and netlib backend tests:

mklcpu_rocblas_hipsycl.txt
netlib_rocblas_hipsycl.txt

@sbalint98 sbalint98 requested a review from mmeterel February 4, 2022 14:47
Copy link
Contributor

@mmeterel mmeterel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more reviews, questions. A few notes:

  1. Please apply clang-format to all edited files. I see some of them are not formatted.
  2. Please rebase to take latest merges.
  3. I will run CI again after the changes.

@mmeterel
Copy link
Contributor

mmeterel commented Feb 7, 2022

@mkrainiuk Could you please review this PR? Especially for the changes in Copyright banner, build process and README? I provided feedback on README as much as possible (it is WIP)

Copy link
Contributor

@mkrainiuk mkrainiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, adding several comments for README.

@mmeterel
Copy link
Contributor

mmeterel commented Feb 8, 2022

Results of the mklCPU and netlib backend tests:

mklcpu_rocblas_hipsycl.txt netlib_rocblas_hipsycl.txt

Thanks for confirming.

@mmeterel
Copy link
Contributor

@sbalint98 Could you please resolve conflicts?

@sbalint98
Copy link
Contributor Author

Sure, rebased onto develop

Copy link
Contributor

@mmeterel mmeterel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbalint98 Thanks a lot for your efforts in adding hipSYCL compiler support and enabling rocBLAS backend for AMD GPUs.

Co-authored-by: Mesut Meterelliyoz <mesut.meterelliyoz@intel.com>
@mmeterel mmeterel merged commit f516ad1 into uxlfoundation:develop Feb 15, 2022
@sbalint98
Copy link
Contributor Author

Thanks for your time, helpful comments and contributions @mmeterel @mkrainiuk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants