Skip to content

[BLAS][rocBLAS] Update rocBLAS backend#428

Merged
mmeterel merged 16 commits intouxlfoundation:developfrom
jle-quel:jle-quel/update-rocblas-backend
Jan 18, 2024
Merged

[BLAS][rocBLAS] Update rocBLAS backend#428
mmeterel merged 16 commits intouxlfoundation:developfrom
jle-quel:jle-quel/update-rocblas-backend

Conversation

@jle-quel
Copy link
Contributor

@jle-quel jle-quel commented Dec 14, 2023

Description

This PR updates the rocBLAS backend's functions for LEVEL1, LEVEL2, LEVEL3, EXTENSION and BATCH which used to be unimplemented.

Most ColumnMajor, RowMajor, Buffer and USM functions which used to be marked as throw unimplemented are now implemented and does not result in skipped test anymore.

The functions that are still marked as implemented (due to missing a mapping to rocBLAS) are:

  • LEVEL1
    • axpby
    • dot(float, float, double)
  • LEVEL2
  • LEVEL3
  • EXTENSIONS
    • gemm_bias
    • gemmt
    • imatcopy
  • BATCH
    • imatcopy_batch

With the exception of row_major::gemv_batch from BATCH which will soon be implemented.
Implemented with the latest commit Update unimplemented function 'row_major::gemv_batch'

This PR also updated some minor parts of already implemented functions:

  • Modify the call to depends_on. This method can take a vector, so there is no need to loop around it.
  • Modify the function argument names so that they are all coherent.
  • Switch the call to wait to wait_and_throw.
  • Modify the throw unimplemented messages to be more accurate and up to date. (Restore during PR's review)

Checklist

All Submissions

  • Do all unit tests pass locally? Attach a log.
  • Have you formatted the code using clang-format?

example_blas_gemm_usm.log.txt
test_main_blas_ct.log.txt
test_main_blas_rt.log.txt

@jle-quel jle-quel marked this pull request as ready for review December 14, 2023 18:09
@jinz2014
Copy link
Contributor

I submitted a pull request for bf16 gemm and then found your pull requests for supporting many functions.

Is it right rocblas supports bf16 gemm ?

https://rocm.docs.amd.com/projects/rocBLAS/en/docs-5.7.1/API_Reference_Guide.html#rocblas-gemm-ex-batched-strided-batched

@jle-quel
Copy link
Contributor Author

jle-quel commented Dec 19, 2023

Nice catch @jinz2014

level3::gemm does not support bf16; However, the extension::gemm_ex does support it.

It also made me realize that gemm_ex should be moved out of the rocblas_level3.cpp and into the rocblas_extensions.cpp.

I'll commit this ASAP.

Would you mind also adding the support for row_major gemm_ex? The code is already present for it, within this PR.

Do you prefer to merge your PR on its own, or merge it as part of this rocBLAS update PR?

@jinz2014
Copy link
Contributor

Please merge my PR as part of your rocBLAS update. I will close my PR.

@jinz2014
Copy link
Contributor

jinz2014 commented Jan 2, 2024

@andrewtbarker Could you please add reviewers for the PR ? Thanks.

Copy link
Contributor

@mmeterel mmeterel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jle-quel Thanks a lot for this high quality PR. It is adding features that do not even exist in vendor libraries.
I have a few cosmetic comments and some concerns about the structure and consistency across entire project.

@muhammad-tanvir-1211
Copy link
Contributor

I see all the buffer tests failing for the rocblas backend with PI_ERROR_INVALID_OPERATION.

Logs:
PR_428.txt

This PR is blocked on oneapi-src/unified-runtime#1226 and intel/llvm#12297

@jle-quel
Copy link
Contributor Author

jle-quel commented Jan 12, 2024

I see all the buffer tests failing for the rocblas backend with PI_ERROR_INVALID_OPERATION.

Logs: PR_428.txt

This PR is blocked on oneapi-src/unified-runtime#1226 and intel/llvm#12297

The reason that the buffer tests fails (PI_ERROR_INVALID_OPERATION ) is not because of this PR.

It is because the "Accessor interop requires urMemGetNativeHandle to return a valid mem handle. Multi dev ctx breaks this ..."

This means that if you go to a previous commit, the results will be the same.
Defacto this PR is not blocked; this PR lives independently from the PRs that you provided.
However, to test it, you either have to pull the modifications made from the PRs that you provided or wait for them to be merged (and this logic does not only apply to this PR, but to any "buffers function"-"accessor interop" from oneMKL)

@muhammad-tanvir-1211
Copy link
Contributor

I see all the buffer tests failing for the rocblas backend with PI_ERROR_INVALID_OPERATION.
Logs: PR_428.txt
This PR is blocked on oneapi-src/unified-runtime#1226 and intel/llvm#12297

The reason that the buffer tests fails (PI_ERROR_INVALID_OPERATION ) is not because of this PR.

It is because the "Accessor interop requires urMemGetNativeHandle to return a valid mem handle. Multi dev ctx breaks this ..."

This means that if you go to a previous commit, the results will be the same. Defacto this PR is not blocked; this PR lives independently from the PRs that you provided. However, to test it, you either have to pull the modifications made from the PRs that you provided or wait for them to be merged (and this logic does not only apply to this PR, but to any "buffers function"-"accessor interop" from oneMKL)

Yes, that is what I meant actually, but I should have worded my comment better. Thank you for the correction.
The two PRs mentioned in my comment will hopefully help resolve the failing tests, and the failures are not directly because of the changes made in this PR.

Copy link
Contributor

@andrewtbarker andrewtbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice PR. I have a suggestion to reduce code duplication in the level 2 BLAS interfaces, but generally I'm very happy with this.

Copy link
Contributor

@andrewtbarker andrewtbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice, approved!

Copy link
Contributor

@mmeterel mmeterel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updates. Looks good.

@mmeterel mmeterel merged commit 55a8b40 into uxlfoundation:develop Jan 18, 2024
@jle-quel jle-quel deleted the jle-quel/update-rocblas-backend branch January 18, 2024 17:05
zettai-reido pushed a commit to zettai-reido/oneMKL-for-VM that referenced this pull request Jan 24, 2024
Co-authored-by: Jin Z <5zj@cousteau.ftpn.ornl.gov>
zettai-reido pushed a commit to zettai-reido/oneMKL-for-VM that referenced this pull request Jan 24, 2024
Co-authored-by: Jin Z <5zj@cousteau.ftpn.ornl.gov>
normallytangent pushed a commit to normallytangent/oneMKL that referenced this pull request Aug 6, 2024
Co-authored-by: Jin Z <5zj@cousteau.ftpn.ornl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants