-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intel ARC Support #1575
Comments
Also i do wonder if this could support 2 gpu's so you don't have too offload anything into RAM. Such as a arc a770 and a rx 6600. |
It doesn't have support for OneAPI or OpenVINO currently from my knowledge, as I myself own an Intel Arc GPU. |
It doesn't, unfortunately. I really wish it did though as I have a dual A770 system myself (and these cards have a lot of VRAM for the price, and also good low-precision AI accelerators, etc). For now I'm running on CPU which is, of course, horribly slow. However, one issue is that Intel's support for pytorch on its GPUs needs a special version based on pytorch 1.10 (see https://www.intel.com/content/www/us/en/developer/articles/technical/introducing-intel-extension-for-pytorch-for-gpus.html) but this system uses pytorch 2.0.0. As soon as Intel gpu support for pytorch 2.0.0 comes out though I'm hoping support can be extended in this system (if I can find time maybe I'll even be able to contribute some patches). For CPU pytorch 2.0.0 is already supported: https://intel.github.io/intel-extension-for-pytorch/latest/tutorials/releases.html In the meantime, it would be great if the readme could at least be updated to say WHAT GPUs are supported. BTW The one-click installer also fails if you don't have an NVIDIA GPU, even if you select "None". I had to go the git clone route. |
Multi-GPU support for multiple Intel GPUs would, of course, also be nice. MultiGPU is supported for other cards, should not (in theory) be a problem. I personally don't really care about mixing GPUs from different vendors, though :) A bonus would be the ability to use Intel integrated graphics, although they have limited VRAM capabilities, but maybe good enough for some simple things. |
Would love to see this as well, with the power and amount of VRAM the arc is a great little card for those of us that do more compute stuff than gaming, especially considering the price. |
Intel has released torch 2.0 support for arc gpus. https://github.com/intel/intel-extension-for-pytorch/releases/tag/v2.0.110%2Bxpu |
Does the release of pytorch 2 support move things forward for Arc support? |
I have created a pinned thread for Intel Arc discussion and welcome you to move the discussion there: #3761 To my knowledge, llama-cpp-python should work with GPU acceleration on Intel Arc as long as you compile it with CLBLAST. See https://github.com/oobabooga/text-generation-webui#amd-metal-intel-arc-and-cpus-without-avcx2 |
You rock! Thank you for all the hard work on this project! |
@oobabooga Intel Arc GPU support is in the pipeline ; the support integration would be started in 2-3 weeks time (by myself) . There are some other items in the pipeline at Intel which we are covering - and we plan to add this to our GPU soon. |
@abhilash1910 thanks for the info. For XPU inference on transformers, is it currently enough to do
or similar, like here? Does any special pytorch import command have to be made? |
I found this while researching how this all works. https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/examples.html It looks like there shouldn't be much to change, but I'm new to LLM/AI development. So I may be missing something. |
Thanks @itlackey. I guess it should be a few changed lines (for the transformers loader):
It would be nice if someone could test this. |
I'll have time in a few days and will give it a shot. We may also need to make some changes to the installer and/or docker image to load the Intel libs and driver and recompile llama.cpp to get xpu to work. |
Good to know the interest ; thanks @oobabooga @itlackey (helps to determine priority). I will add in the changes starting tomorrow(25th Sept) and that can be tested.
|
Awesome @abhilash1910 :) |
Hello, I just purchased an Intel Arc A770 16gb. When it arrives (in a week) I will be willing to help test stuff on linux. |
small update: The GPU has arrived, I will install it into my PC when I have time. I am excited to start playing around with LLMs on my own PC. |
Doesn't change anything (yet). Using an Intel Iris Xe Graphics (not very good, I know) on WSL2. I'll test some more stuff out. |
Not sure if this is user error (im new to this) or an actual issue, but I'm getting errors talking about cuda while trying to load in a model. I find this really odd, especially because I chose the IPEX option during the ./start_linux.sh first time install.
|
@Yorizuka can you try making those changes to diff --git a/modules/models.py b/modules/models.py
index 5bd9db74..c376c808 100644
--- a/modules/models.py
+++ b/modules/models.py
@@ -137,6 +137,8 @@ def huggingface_loader(model_name):
if torch.backends.mps.is_available():
device = torch.device('mps')
model = model.to(device)
+ elif hasattr(torch, 'xpu') and torch.xpu.is_available():
+ model = model.to('xpu')
else:
model = model.cuda()
diff --git a/modules/text_generation.py b/modules/text_generation.py
index 0f24dc58..295c7cdd 100644
--- a/modules/text_generation.py
+++ b/modules/text_generation.py
@@ -132,6 +132,8 @@ def encode(prompt, add_special_tokens=True, add_bos_token=True, truncation_lengt
elif torch.backends.mps.is_available():
device = torch.device('mps')
return input_ids.to(device)
+ elif hasattr(torch, 'xpu') and torch.xpu.is_available():
+ return input_ids.to('xpu')
else:
return input_ids.cuda()
|
I applied the patch, same issue.
to confirm I did the patch correctly, here is the git status:
and my |
So I uninstalled the torch and torchvision installed by the one-click installer and reinstalled IPEX, resulting in an unidentified .so error. Putting And to add onto what @Yorizuka mentioned, trying run a GPTQ model in Transformers also gives this error: |
I think the issue described in this comment #3761 (comment) is likely related to the issue we are having here. |
@TheRealUnderscore about the transformers error, can you check if it works after this commit? |
@oobabooga These were my PyTorch settings (via PyTorch built with:
- GCC 9.3
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX512
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, Image for more readable build settings And these are my 2.0.1a0 settings. Now lots of things have changed: PyTorch built with:
- GCC 11.2
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2023.2-Product Build 20230613 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/gcc-toolset-11/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, Image for more readable build settings
Are any of these settings relevant to the GPU? I'll keep looking into it on my own, I wouldn't be surprised if it was an installation error by me. |
Some updates regarding failures to build or compile with our systems(FYI):
|
Sorry in advance for the long post. Unfortunately the above is only a part of the solution: other requirements install a more recent version of PyTorch not compatible with Intel GPUs. So I did a manual install from scratch: Install Intel drivers:
Install Intel® oneAPI Base Toolkit:
Install some missing libraries:
Install Miniconda 3:
Create a new conda environment:
Install the WebUI:
Install PyTorch:
Activate PyTorch:
Test it is working:
Install llama-cpp-python:
Start the server:
Download your model and enjoy :) My tests were done on a dedicated install of Ubuntu running on Win 11 WSL2 on a Samsung Galaxy Book2 (Intel i7 processor with integrated graphics and 16GB of RAM): llama2 13B (32 layers to GPU) loads fast and runs above 2 tokens per seconds which is acceptable for personal use. Hope it helps, |
@cyrillebeauchamp thanks for the guide, that's extremely valuable. I think that with your commands it may be possible to automatically compile llama-cpp-python for Intel GPUs using GitHub Actions. Then we could add these to dedicated requirements_intel.txt and requirements_intel_noavx2.txt files. If you are interested, you may be able to create a new .yml in this repository (where all the wheels here are compiled) and PR it to @jllllll: https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/tree/main/.github/workflows |
@cyrillebeauchamp Thanks for your detailed instruction! But I still have trouble loading the LLM. In the beginning, I thought that the problem may be caused by the Intel iGPU which can also be used as a Level-Zero XPU device. So I export a environment variable "ONEAPI_DEVICE_SELECTOR=level_zero:0". But the error still happened. By the way, the GPU IMC usage and the RAM usage increased quickly after I click the "Load" button. Any ideas? Thank you! |
Using llama.cpp for Intel Arc support will result in most operations running on the CPU currently. PyTorch extensions work well on Intel Arc. FastChat uses it and it's significantly faster than running llama.cpp. There is work being done to support better drivers in llama.cpp but as of right now the OpenCL implementation runs primarily on the CPU. This is why we see 2 t/s using llama.cpp and around 20 t/s using FastChat on Arc GPUs. |
https://github.com/intel/intel-extension-for-pytorch/releases/tag/v2.1.10%2Bxpu new version of IPEX, native windows support has been added |
|
I'm not sure if this is the right place to post this. I receive the below error after installing OobaBooga using the default Arc install option on Windows. The install seemed to go well but running it results in the below DLL load error. Other threads that mentioned this loading error suggested it might be a PATH issue. I tried adding a few paths to the OS environment but couldn't resolve it. Any suggestions? It's an Arc A770 on Windows 10.
|
I think that the new loader below (gpt-fast) should work on Intel Arc, but I have no way of testing it. It has 4-bit and 8-bit support. Any tests would be welcome. |
I really don't know much as I am super new to AI, I am looking into getting more into AI mainly for Home Assistant and hopefully someday an assistant similar to Iron Man's Jarvis(Jarvis isn't possible today right?). Anyways I have a home server that I just built(14700k) and with the main use case as plex, I stumbled upon LocalAI last week (didn't even realize AI could be home hosted) I got excited about the above possibility and hosting AI would also help justify the cost of my server upgrade and running it 24/7. |
It looks like llama.cpp now supports SYCL for Intel GPUs. Is Arc support now possible? |
This is brilliant news! Could I get a ping when it's been implemented into this repo please? |
I am aware of SYCL and wanted to ask about this. There is also the Vulkan option now that may work on Intel Arc. Can someone try these and see which one works with GPU offloading and is the fastest? pip uninstall -y llama_cpp_python llama_cpp_python_cuda llama_cpp_python_cuda_tensorcores
CMAKE_ARGS="-DLLAMA_SYCL=on" pip install llama-cpp-python # Option 1: SYCL
CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python # Option 2: Vulkan
CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python # Option 3: Kompute |
Stupid question: So I assume that since there is no CMake in WGU's codebase, I assume you are talking about building llama.cpp, so once I build it, do I "plug in" the compiled stuff into TGWUI or do I run a model on llama.cpp somehow? (My only understanding of llama.cpp is that it is some kind of backend.) Which ever of those things I've got to do how do I do it? |
Unfortunately I have no idea how to build this myself. However, if someone wants to send me one that I can download I am more than happy to test. All I know is that the current installation procedure for Intel ARC doesn't utilise my Intel Arc A770 16GB, even when it downloads all the correct packages and says it's offloaded the layers to GPU (which I can tell it hasn't because I have no VRAM being used up). Edit for Clarification: Doesn't work unless using Transformers and no quantisation. |
There's also the following library which should allow us to use quantised versions of the models that are for hugging face's transformers library on intel GPUs, as currently this may be approach to getting quantised models running on these GPUs other than through Llama CPP: |
I hope this isn't disruptive to your work, but any updates on some wheels for us to test with? I've tried building them personally but to no success @oobabooga |
@oobabooga Thanks! |
Here is my solution using llama-cpp-python: Test environment: Linux Mint 21.3 Cinnamon with Linux kernel 6.5.0-25 Python version: 3.10.* or 3.11.* Before all steps, install the Intel Driver(Thanks @cyrillebeauchamp): # download the key to system keyring
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
sudo gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg
# add signed entry to apt sources and configure the APT client to use Intel repository:
echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy client" | \
sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list
#update the repositories and install the base kit:
sudo apt update
sudo apt install -y \
intel-opencl-icd intel-level-zero-gpu level-zero \
intel-media-va-driver-non-free libmfx1 libmfxgen1 libvpl2 \
libegl-mesa0 libegl1-mesa libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
libglapi-mesa libgles2-mesa-dev libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \
mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo hwinfo clinfo Method 1: Vulkan Step 1: Install Vulkan SDK wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo apt-key add -
sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list
sudo apt update -y
sudo apt-get install -y vulkan-sdk
# To verify the installation, use the command below:
vulkaninfo Step 2: Install necessary python packages pip install -r requirements_nowheels.txt Step 3: Build and install the wheel CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python Step 4: Launch and enjoy it! python server.py Method 2: SYCL Step 1: Install Intel® oneAPI Base Toolkit Step 2: Install necessary python packages pip install -r requirements_nowheels.txt Step 3: Install pytorch and IPEX pip install torch==2.1.0a0 torchvision==0.16.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ REMINDER: If you are using oneAPI 2024.1.0 or newer, please run the following command instead: python -m pip install torch==2.1.0.post0 torchvision==0.16.0.post0 torchaudio==2.1.0.post0 intel-extension-for-pytorch==2.1.20+xpu oneccl_bind_pt==2.1.200+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ Step 4: Build and install the wheel source /opt/intel/oneapi/setvars.sh
CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_SYCL_F16=ON" pip install llama-cpp-python Step 4: Launch and enjoy it! python server.py |
idk what gpu you are using but do you know what your arc card is drawing at idle? I have been trying to see if its even possible to get intel acr aspm down to under 1 watt of power like in windows, Im using unraid and had to use a custom kernel so idk if that is any reason why I cant get the low idle to work. My 770 is constantly drawing 40w, and I only went with ARC because of the extremely low idle. Have you had any luck? |
@ksullivan86 I use A770 too and its idle power is about 40w too. I tried to use ASPM to lower it but it didn't work on both Windows and Linux. I suspect that the power management when some of ARC graphic cards are idle is not implemented at present. There may be differences in cards between different AIC manufacturers |
FYI this works in WSL2 Windows 11 A770 16g, make sure to disable iGPU (I just did it in bios instead of device manager to be sure) Did it with syctl Truncated quote from @NineMeowICT above, for brevity
With iGPU enabled it didn't work, works great with A770 only Thanks for putting this together @NineMeowICT and others |
Thanks for the write-up @NineMeowICT, nice one! Since then something has probably changed and doing the SYCL method did not entirely work. I received an error that is described in this issue: pytorch/pytorch#123097 Installing these versions of packages instead solved the issue:
|
@sambartik Yes, you're right. I also encountered this so I downgrade oneMKL. I will edited my post to keep up. Thanks! |
This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment. |
If this method no longer works after updating to intel toolchain 202x.x(your version installed), you may need to manually install |
I was wondering if Intel ARC Gpu's work with this. Could not read anything about this here.
The text was updated successfully, but these errors were encountered: