-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature : Add SYCL runtime support #747
base: sycl-refactor
Are you sure you want to change the base?
Feature : Add SYCL runtime support #747
Conversation
@Titus-von-Koeller based on issue #894 , there are some abstractions which are specific to certain intel devices which are to be selectively implemented in Ipex but for a majority of the ops/kernels to be supported across all SYCL devices this integration is needed. This is a WIP as it requires some dependencies to be resolved on Intel LLVM to make it runnable across SYCL devices. In general this is a broader approach catering to SYCL runtime. @jianan-gu |
Dear @abhilash1910, Thanks for submitting this PR and flagging the connection with #894, really appreciated. Would you be so kind to give a bit more background and describe in detail that you're PR is aiming to solve (once it's complete) and what roadblocks you see before getting there? Beware, that contrary to Tim, my systems programming knowledge is not so good, yet, but once I understand the above better, I'm happy to coordinate + help along this contribution. Could you please elaborate specifically where you see overlap + hurdles to resolve relative to #894? |
Hi @Titus-von-Koeller |
This is great, thank you so much for your contribution! Can you run me through how a user would use/compile this for their Intel device? A step-by-step procedure would be great to get a easy overview for users. |
Hi @TimDettmers , @Titus-von-Koeller as my fellow colleagues are working on the common device abstraction properties , this is an alternative addition to support SYCL backend across vendors . Since here we are only abstracting the sycl kernels, there is a need for a dedicated cmake for running only the kernels. The compilation of this would require a standalone cmake with the following options:
|
@abhilash1910 I have a few questions.
|
@matthewdouglas Thanks for the questions. I will try to answer them in best way possible:
Also on other note, I would like to add that SYCL support specifically for kernels would eventually be one of the ways for Intel GPU upstream effort that is ongoing. And having SYCL backend opens a lot of Intel (not only) but also other devices for competitive benchmarking. |
Seeing the development update in #898 on a separate branch, I do agree branch separation of backends will be helpful . While the common device abstraction is the probable goal to merge, I think this PR would benefit from a standalone SYCL branch . Since we plan to integrate full sycl support on all kernel adaptations , this would be helpful to maintain as a separate branch. Hence proposing for a separate branch - SYCL. |
Yes these are I think the prospective answers:
|
@Titus-von-Koeller A detail on this PR: I am nearing completion of the kernels . Since the LOC is quite big for this PR , I think the pytorch aspect of linking the binaries can be extended in a separate PR. I am open to suggestions and can add here as well but it would make the review process very hectic. |
One point to note is that bitsandbytes isn't built as a PyTorch extension currently for any of the backends. Instead the shared lib exports a C API that is invoked with Python's This SYCL backend also does not need to link to libtorch or even python, and since it exposes the same API (see pythonInterface.cpp), it should be possible to just build multiple targets (e.g. spirv, ptx, amdgcn) and load the correct .so at runtime; no CUDAExtension or DPCPPExtension needed. |
Yes this is something which I noticed, and in that case the kernels can be built for multiple targets. |
Hi @Titus-von-Koeller , pinging for a review . The code is now updated with latest compiler . After some discussion internally , this PR would only consist of the SYCL kernels only so that these can be abstracted across vendors. Since the kernels are completed I would request start of the review process ; also there is a test suite which has been created for SYCL and wanted to discuss how to add that. Nvidia build feature is also added. There is also a seperate repository containing iGPU/ Cuda builds of this fork with python bindings: https://github.com/abhilash1910/bitsandbytes_sycl_samples FYI: There is another PR from my collegaue with xpu support #1257 which is using ipex for xpu (Intel GPU) support only. We are discussing better ways to formulate it on the multibackend refactor branch. |
Pinging @Titus-von-Koeller @matthewdouglas with latest updated code. |
Dear @abhilash1910, Thanks for this valuable contribution. I see you've made major progress, impressive! We (the bitsandbytes team) are under very tight bandwidth at the moment so we won't have time to deep dive into it just yet. Currently we're both trying to get out of a maintenance impasse and preparing for the Intel CPU+XPU, AMD ROCm alpha release. Additionally, we're working on infra, pkging and testing improvements. Also, the multi-backend stuff still needs a major refactor with everyone waiting for us to finalize the backend abstraction with a move to the These things currently take priority. We'll however try to take a look as soon as we can and with @matthewdouglas having just been added to the team full-time, we're in a better position going forward. |
|
Hi @Titus-von-Koeller |
In the plan to support an extensive ecosystem spanning from Huggingface, Lora/LLama families, this addition would enable this great FW to run on our GPU cards seemlessly. The discussion started from this merged PR (artidoro/qlora#219) and currently this PR would eventually enable low precision quantization. (PR is initial draft state).
Tagging @TimDettmers . Thanks for creating this repository.