[BLAS::SYCL-BLAS backend] New BLAS backend: SYCL-BLAS#262
[BLAS::SYCL-BLAS backend] New BLAS backend: SYCL-BLAS#262mkrainiuk merged 20 commits intouxlfoundation:developfrom codeplaysoftware:sycl-blas-backend-staging
Conversation
* Adds SYCL-BLAS backend to OneMKL * SYCL-BLAS is an pure-SYCL opensource BLAS implementation * SYCL-BLAS is added as a header-only library Co-authored-by: Romain Biessy <romain.biessy@codeplay.com> Co-authored-by: Muhammad Tanvir <muhammad.tanvir@codeplay.com> Co-authored-by: Kumudha Narasimhan <kumudha.narasimhan@codeplay.com> Co-authored-by: Fabio Mestre <fabio.mestre@codeplay.com> Co-authored-by: Paolo Gorlani <paolo.gorlani@codeplay.com> Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com>
|
Thank you for the PR! Can you provide more details on the segfaults? There are several other routines that are not implemented in SYCL-BLAS but they do not segfault (omatcopy, omatadd). |
The only segfaulting test is ImatcopyBatchStrideTestSuite, for complex single precision with row-major. I'm compiling DPC++ with the 2022-06-10 nighty on Ubuntu. If I comment out the call to If I was guessing as to the location of the bug, I'd look into lda = std::max(m, n);
ldb = std::max(m, n);which I don't understand at the moment - the imatcopy docs and imatcopy batched strided docs make this parameter look rather more complicated. |
andrewtbarker
left a comment
There was a problem hiding this comment.
I'm still working my way through the code, so there could be more comments, but here are a few to start:
- This could be a very nice addition to the project!
- Does SYCL-BLAS not support USM memory at all? I am not sure of the long-term future of the buffer interfaces so this could be a concern.
- I cannot reproduce the segfault for imatcopy batch stride.
- I am seeing quite a few failures in my local testing (integrated Intel GPU). I have not investigated these closely, but I'm attaching the log just in case the list of failures means anything.
|
Fix compile time dispatch testing when using sycl-blas
|
Typical error message from my failures: Driver version: 22.08.22549 I'm using the icx compiler from the 2022.2 base toolkit, and building only the SYCL-BLAS backend. This is quite possibly a bad configuration locally, I'm investigating more closely. |
|
I have the tests passing on a different machine, so never mind the failures I saw. Overall I think this PR is looking good. |
@andrewtbarker Do we have any plans in place for testing this backend in terms of machines, compilers as a part of CI? If not, I think, we should make that plan and validate the PR on all configurations (if we have not done it yet). |
|
@mmeterel The PR in its current form is only supported with dpc++ compiler and Intel GPU. There is some discussion about the eventual scope for this, we'll see. I will try to document the scripts for the install and test process so we can automate it in the future. |
Enable multiple devices type with the SYCL-BLAS backend. Co-authored-by: Hugh Bird <hugh.bird@codeplay.com>
|
Logs of the SYCL-BLAS backend on different devices: |
andrewtbarker
left a comment
There was a problem hiding this comment.
This is generally looking pretty good. I have some comments below about the documentation (which I think could generally be clearer for a non-expert user) and I'm running some tests now.
|
Is this intended to skip all row_major tests? |
Yes, SYCL-BLAS does not support row major operations currently. We are planning to support this in the future. |
|
On nVidia hardware I'm seeing lots of errors of this form: I'll dig in a little more and let you know what I find. |
|
@andrewtbarker this looks like an error with the wrong target triple? |
|
@andrewtbarker I am not keen on adding more CMake flags specific to SYCL-BLAS. One difference with the variables in |
|
Would it be better if we add the At the very least this needs to be documented much better. I did not know the explicit |
|
It's not possible to set We are documenting |
|
I made a PR into this branch to show what I have in mind. It works fine on our nvidia GPU, have not tried it with AMD. For documentation, do I understand the following correctly:
If I'm right on these points (or even if I'm wrong), some of this needs to be clarified in |
|
I have updated the documentation according https://github.com/codeplaysoftware/oneMKL/pull/16 will be merged. |
|
This is much better, thank you. Sorry for the churn and confusion, I just want to make sure that if we claim to support multiple HW it's clear to users how to use them. I'm ready to approve this once the above PR (or something similar) is merged. |
[BLAS] Automate -fsycl-targets argument for SYCL-BLAS backend
andrewtbarker
left a comment
There was a problem hiding this comment.
Thanks for all your wok on this!
|
@mkrainiuk Can you or someone from engine team take a look at this when you get a chance? |
Hi @andrewtbarker, please take a look at #281 which explains how to reproduce the |
|
Hi @andrewtbarker @mkrainiuk is there anything else that you would like us to add in this PR? |
|
Hi @muhammad-tanvir-1211 , from my perspective this is ready to go but we need a second review to merge. Our team is right in the middle of a busy process right now, sorry for the delay. |
mkrainiuk
left a comment
There was a problem hiding this comment.
Thank you for the PR! I have several questions to the implementation.
I also want to clarify the long-term plan for this backend. As it's header based opensource solution I think it could improve the usability if it can be fully integrated to the oneMKL project. In this case we can skip the argument conversion because of different API and reduce the execution time.
| netlib, | ||
| rocblas, | ||
| rocrand, | ||
| syclblas, |
There was a problem hiding this comment.
Can we build new backend with the name of what device arch it is targeted for? E.g., syclblas_intelgpu, syclblas_nvidiagpu? With the current implementation it's not possible to have syclblas backends built for different targets at the same time.
There was a problem hiding this comment.
I think we had a different idea to tuned for multiple targets. We were planning to keep only one backend syclblas and improve SYCL-BLAS to allow to tune for multiple targets at the same time. Maybe we could also make sure the translation unit are split and SYCL-BLAS is included multiple times with different macro definitions for the tuning.
I don't see the benefit of adding multiple syclblas backends currently. In any case this needs more work to be supported, is it important to support for the first PR?
There was a problem hiding this comment.
Not really, I'm just trying to clarify the long-term support. If the dispatching between different HW will happen in the backend itself it's fine to me to have only syclblas as the backend library name. But in this case I guess all dispatching logic implemented on oneMKL Interfaces side will be duplicated inside syclblas backend, so we probably need to make sure we will remove this duplication in the future.
mkrainiuk
left a comment
There was a problem hiding this comment.
I have a minor request for documentation to make sure CPU support performance limitations are covered. Except this the initial implementation looks fine to me. Thank you!
When the SYCL BLAS project will be mature enough I think we need to revisit the integration for this backend: removing device detection duplication in oneMKL interface layer/SYCL BLAS and also considering option to make SYCL BLAS part of oneMKL Interfaces project.
…ycl-blas-backend-staging
…ycl-blas-backend-staging
…ycl-blas-backend-staging
) * [BLAS] New BLAS backend: SYCL-BLAS * Adds SYCL-BLAS backend to OneMKL * SYCL-BLAS is an pure-SYCL opensource BLAS implementation * SYCL-BLAS is added as a header-only library Co-authored-by: Romain Biessy <romain.biessy@codeplay.com> Co-authored-by: Muhammad Tanvir <muhammad.tanvir@codeplay.com> Co-authored-by: Kumudha Narasimhan <kumudha.narasimhan@codeplay.com> Co-authored-by: Fabio Mestre <fabio.mestre@codeplay.com> Co-authored-by: Paolo Gorlani <paolo.gorlani@codeplay.com> Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> * Remove duplicate unsupported backend in backend map * Fix compile time dispatch testing when using sycl-blas * Enable rotmg * Fix issue in previous commit * Re-enable Trsm * Change wording in docs and exception; fix typo; * Sycl blas backend staging device support (uxlfoundation#13) Enable multiple devices type with the SYCL-BLAS backend. Co-authored-by: Hugh Bird <hugh.bird@codeplay.com> * Revert change to disable OpenCL gpu devices * Update documentation * [BLAS] Automate -fsycl-targets argument for SYCL-BLAS backend * Update documentation * Fix for CMake 3.16 * Add checks for compiler --------- Co-authored-by: Romain Biessy <romain.biessy@codeplay.com> Co-authored-by: Muhammad Tanvir <muhammad.tanvir@codeplay.com> Co-authored-by: Kumudha Narasimhan <kumudha.narasimhan@codeplay.com> Co-authored-by: Fabio Mestre <fabio.mestre@codeplay.com> Co-authored-by: Paolo Gorlani <paolo.gorlani@codeplay.com> Co-authored-by: Mehdi Goli <mehdi.goli@codeplay.com> Co-authored-by: Andrew T. Barker <andrew1.barker@intel.com>
Description
Checklist
All Submissions
--gtest_filter=-ImatcopyBatchStrideTestSuite*because this test suite segfaults. This function is not implemented in SYCL-BLAS.