Intel ARC Support #1575

linus378 · 2023-04-26T15:47:54Z

I was wondering if Intel ARC Gpu's work with this. Could not read anything about this here.

linus378 · 2023-04-26T15:52:53Z

Also i do wonder if this could support 2 gpu's so you don't have too offload anything into RAM. Such as a arc a770 and a rx 6600.

dan9070 · 2023-04-27T23:12:30Z

It doesn't have support for OneAPI or OpenVINO currently from my knowledge, as I myself own an Intel Arc GPU.

mmccool · 2023-05-02T17:48:43Z

It doesn't, unfortunately. I really wish it did though as I have a dual A770 system myself (and these cards have a lot of VRAM for the price, and also good low-precision AI accelerators, etc). For now I'm running on CPU which is, of course, horribly slow.

However, one issue is that Intel's support for pytorch on its GPUs needs a special version based on pytorch 1.10 (see https://www.intel.com/content/www/us/en/developer/articles/technical/introducing-intel-extension-for-pytorch-for-gpus.html) but this system uses pytorch 2.0.0. As soon as Intel gpu support for pytorch 2.0.0 comes out though I'm hoping support can be extended in this system (if I can find time maybe I'll even be able to contribute some patches). For CPU pytorch 2.0.0 is already supported: https://intel.github.io/intel-extension-for-pytorch/latest/tutorials/releases.html

In the meantime, it would be great if the readme could at least be updated to say WHAT GPUs are supported.

BTW The one-click installer also fails if you don't have an NVIDIA GPU, even if you select "None". I had to go the git clone route.

mmccool · 2023-05-02T17:52:28Z

Multi-GPU support for multiple Intel GPUs would, of course, also be nice. MultiGPU is supported for other cards, should not (in theory) be a problem. I personally don't really care about mixing GPUs from different vendors, though :)

A bonus would be the ability to use Intel integrated graphics, although they have limited VRAM capabilities, but maybe good enough for some simple things.

rattlecanblack · 2023-05-11T17:57:59Z

Would love to see this as well, with the power and amount of VRAM the arc is a great little card for those of us that do more compute stuff than gaming, especially considering the price.

miraged3 · 2023-08-14T13:13:23Z

Intel has released torch 2.0 support for arc gpus. https://github.com/intel/intel-extension-for-pytorch/releases/tag/v2.0.110%2Bxpu

itlackey · 2023-08-30T02:02:42Z

Does the release of pytorch 2 support move things forward for Arc support?

oobabooga · 2023-08-30T17:35:09Z

I have created a pinned thread for Intel Arc discussion and welcome you to move the discussion there: #3761

To my knowledge, llama-cpp-python should work with GPU acceleration on Intel Arc as long as you compile it with CLBLAST. See https://github.com/oobabooga/text-generation-webui#amd-metal-intel-arc-and-cpus-without-avcx2

itlackey · 2023-09-01T01:49:58Z

You rock! Thank you for all the hard work on this project!

abhilash1910 · 2023-09-08T10:22:17Z

@oobabooga Intel Arc GPU support is in the pipeline ; the support integration would be started in 2-3 weeks time (by myself) . There are some other items in the pipeline at Intel which we are covering - and we plan to add this to our GPU soon.

oobabooga · 2023-09-23T12:39:35Z

@abhilash1910 thanks for the info. For XPU inference on transformers, is it currently enough to do

model.to(torch.device('xpu'))

or similar, like here?

Does any special pytorch import command have to be made?

itlackey · 2023-09-23T15:41:12Z

I found this while researching how this all works.

https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/examples.html

It looks like there shouldn't be much to change, but I'm new to LLM/AI development. So I may be missing something.

oobabooga · 2023-09-23T17:15:46Z

Thanks @itlackey. I guess it should be a few changed lines (for the transformers loader):

model = model.to("xpu") in modules.models.huggingface_loader
return input_ids.to(torch.device('xpu')) in modules.text-generation.encode.

It would be nice if someone could test this.

itlackey · 2023-09-23T18:43:28Z

I'll have time in a few days and will give it a shot. We may also need to make some changes to the installer and/or docker image to load the Intel libs and driver and recompile llama.cpp to get xpu to work.
I was able to do this with a docker image for FastChat and llama.cpp. We should be able to do the same for textgen.

abhilash1910 · 2023-09-24T06:35:42Z

Good to know the interest ; thanks @oobabooga @itlackey (helps to determine priority). I will add in the changes starting tomorrow(25th Sept) and that can be tested.

Thanks @itlackey. I guess it should be a few changed lines (for the transformers loader):

model = model.to("xpu") in modules.models.huggingface_loader

return input_ids.to(torch.device('xpu')) in modules.text-generation.encode.

It would be nice if someone could test this.

oobabooga · 2023-09-24T12:41:46Z

Awesome @abhilash1910 :)

Yorizuka · 2023-09-30T11:50:56Z

Hello, I just purchased an Intel Arc A770 16gb. When it arrives (in a week) I will be willing to help test stuff on linux.
In general if ARC GPUs become usable, it could be a really nice option, especially if multi GPU is possible.

Yorizuka · 2023-10-08T06:02:37Z

small update: The GPU has arrived, I will install it into my PC when I have time. I am excited to start playing around with LLMs on my own PC.

Th-Underscore · 2023-10-12T08:25:39Z

Thanks @itlackey. I guess it should be a few changed lines (for the transformers loader):

model = model.to("xpu") in modules.models.huggingface_loader

return input_ids.to(torch.device('xpu')) in modules.text-generation.encode.

It would be nice if someone could test this.

Doesn't change anything (yet). Using an Intel Iris Xe Graphics (not very good, I know) on WSL2. I'll test some more stuff out.

Yorizuka · 2023-10-15T20:41:01Z

Not sure if this is user error (im new to this) or an actual issue, but I'm getting errors talking about cuda while trying to load in a model. I find this really odd, especially because I chose the IPEX option during the ./start_linux.sh first time install.

2023-10-15 16:30:34 INFO:Loading HuggingFaceH4_zephyr-7b-alpha...
Loading checkpoint shards: 100%|██████████████████| 2/2 [02:04<00:00, 62.41s/it]
2023-10-15 16:32:39 ERROR:Failed to load the model.
Traceback (most recent call last):
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/modules/ui_model_menu.py", line 201, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name, loader)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/modules/models.py", line 79, in load_model
    output = load_func_map[loader](model_name)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/modules/models.py", line 141, in huggingface_loader
    model = model.cuda()
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/installer_files/env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2168, in cuda
    return super().cuda(*args, **kwargs)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 918, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 918, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui-1.7/installer_files/env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 298, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

oobabooga · 2023-10-15T21:10:43Z

@Yorizuka can you try making those changes to modules/models.py and modules/text-generation.py?

diff --git a/modules/models.py b/modules/models.py
index 5bd9db74..c376c808 100644
--- a/modules/models.py
+++ b/modules/models.py
@@ -137,6 +137,8 @@ def huggingface_loader(model_name):
         if torch.backends.mps.is_available():
             device = torch.device('mps')
             model = model.to(device)
+        elif hasattr(torch, 'xpu') and torch.xpu.is_available():
+            model = model.to('xpu')
         else:
             model = model.cuda()
 
diff --git a/modules/text_generation.py b/modules/text_generation.py
index 0f24dc58..295c7cdd 100644
--- a/modules/text_generation.py
+++ b/modules/text_generation.py
@@ -132,6 +132,8 @@ def encode(prompt, add_special_tokens=True, add_bos_token=True, truncation_lengt
     elif torch.backends.mps.is_available():
         device = torch.device('mps')
         return input_ids.to(device)
+    elif hasattr(torch, 'xpu') and torch.xpu.is_available():
+        return input_ids.to('xpu')
     else:
         return input_ids.cuda()

Yorizuka · 2023-10-16T06:29:20Z

I applied the patch, same issue.

2023-10-16 02:25:12 ERROR:Failed to load the model.
Traceback (most recent call last):
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/modules/ui_model_menu.py", line 201, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name, loader)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/modules/models.py", line 79, in load_model
    output = load_func_map[loader](model_name)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/modules/models.py", line 143, in huggingface_loader
    model = model.cuda()
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/installer_files/env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2168, in cuda
    return super().cuda(*args, **kwargs)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 918, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/installer_files/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 918, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/home/yori/mnt/8tb/yori_home_big/text-generation-webui/installer_files/env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 298, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

to confirm I did the patch correctly, here is the git status:

On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   modules/models.py
	modified:   modules/text_generation.py

no changes added to commit (use "git add" and/or "git commit -a")

and my git rev-parse HEAD output d331501ebc83e80c5d8f49c3e7c547730afff5c2

Th-Underscore · 2023-10-16T06:53:50Z

print(f"generations: input_ids set! model class: {shared.model.__class__.__name__} | has xpu {hasattr(torch, 'xpu')}") in text-generation/modules prints: (using a GGUF model, though I'm trying to get CBLAS set up right now though, which is probably why llama.cpp is messing up)

So I uninstalled the torch and torchvision installed by the one-click installer and reinstalled IPEX, resulting in an unidentified .so error. Putting export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/anaconda3/envs/tg/lib in ~/.bashrc fixes that.
But I still get the same message: (I changed the message slightly, my apologies)

And to add onto what @Yorizuka mentioned, trying run a GPTQ model in Transformers also gives this error: RuntimeError: GPU is required to quantize or run quantize model. alongside WARNING:torch.cuda.is_available() returned False. This means that no GPU has been detected. Falling back to CPU mode.

Yorizuka · 2023-10-16T19:39:16Z

I think the issue described in this comment #3761 (comment) is likely related to the issue we are having here.

oobabooga · 2023-10-16T19:57:34Z

@TheRealUnderscore about the transformers error, can you check if it works after this commit?

8ea554b

Th-Underscore · 2023-10-17T04:17:14Z

@oobabooga
It seems the error is something to do with what Yorizuka said. hasattr(torch, 'xpu') returned false in my previous message, so it's not detecting PyTorch XPU whatsoever.

These were my PyTorch settings (via print(torch.__config__.show())) before reinstalling 2.0.1a0:

PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

Image for more readable build settings

And these are my 2.0.1a0 settings. Now lots of things have changed:

PyTorch built with:
  - GCC 11.2
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2023.2-Product Build 20230613 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/gcc-toolset-11/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

Image for more readable build settings
All the differences (I've no clue what could help and what couldn't, so I'm just listing them all):

GCC ver upgraded 9.3 -> 11.2
MKL ver upgraded
MKL-DNN ver downgraded
CPU extensions downgraded AVX512 -> AVX2
CUDA and CUDNN vers removed
CXX_COMPILER devtoolset-9 -> gcc-toolset-11
In CXX-FLAGS:
- D_GLIBCXX_USE_CXX11_ABI state 0 -> 1
- fabi-version removed
- DLIBKINETO_NOCUPTI added
- Werror added, set range-loop-construct
- Wunused-local-typedefs added
- Wno-error added, set deprecated-declarations
- Wno-invalid-partial-specialization removed
- Wno-used-private-field removed
- Wno-aligned-allocation-unavailable removed
- Wno-error added, set redundant-decls
TORCH_VERSION 2.1.0 -> 2.0.1
USE_CUDA ON -> OFF
USE_CUDNN ON -> OFF
USE_NCCL 1 -> OFF

Are any of these settings relevant to the GPU?

I'll keep looking into it on my own, I wouldn't be surprised if it was an installation error by me.

i6od · 2023-10-20T10:54:32Z

I managed to get 0 tokens output with ipex lol

anyways im sleepy ive been at this all day,

abhilash1910 · 2023-10-21T07:14:47Z

Some updates regarding failures to build or compile with our systems(FYI):

For gbnf/ggml based compiler patterns, the support is in progress so there might be failures with older oneapi /dpct (if you are using previous release)
For issues related to IPEX xpu related to build, I would recommend switiching to latest public IPEX. Also tag me in case you are having difficulties building or using IPEX on your arc systems.
This support is in progress and I would update periodically as there are some subsequent works which need to be merged to use this fully .
cc @oobabooga and others who are using our devices. Thank you for your continued support and interest on ARC.

cyrillebeauchamp · 2023-12-06T19:48:14Z

Sorry in advance for the long post.

Unfortunately the above is only a part of the solution: other requirements install a more recent version of PyTorch not compatible with Intel GPUs.

So I did a manual install from scratch:

Install Intel drivers:

# download the key to system keyring
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
sudo gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg

# add signed entry to apt sources and configure the APT client to use Intel repository:
echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy client" | \
sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list

#update the repositories and install the base kit:
sudo apt update
sudo apt install -y \
  intel-opencl-icd intel-level-zero-gpu level-zero \
  intel-media-va-driver-non-free libmfx1 libmfxgen1 libvpl2 \
  libegl-mesa0 libegl1-mesa libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
  libglapi-mesa libgles2-mesa-dev libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \
  mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo hwinfo clinfo

Install Intel® oneAPI Base Toolkit:

# download the key to system keyring
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \
| gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null

# add signed entry to apt sources and configure the APT client to use Intel repository:
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list

#update the repositories and install the base kit:
sudo apt update
sudo apt install intel-basekit

Install some missing libraries:

wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | tee /etc/apt/sources.list.d/oneAPI.list
apt update
apt install intel-oneapi-runtime-openmp=2023.2.2-47 intel-oneapi-runtime-dpcpp-cpp=2023.2.2-47 intel-oneapi-runtime-mkl=2023.2.0-49495

Install Miniconda 3:

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh

Create a new conda environment:

conda create -n textgen python=3.9
conda activate textgen

Install the WebUI:

git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements_nowheels.txt

Install PyTorch:

python -m pip install torch==2.0.1a0 torchvision==0.15.2a0 intel-extension-for-pytorch==2.0.110+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu-idp/us/

Activate PyTorch:

source /opt/intel/oneapi/compiler/latest/env/vars.sh
source /opt/intel/oneapi/mkl/latest/env/vars.sh

Test it is working:

python -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.__version__); print(ipex.__version__); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];"

Install llama-cpp-python:

sudo apt-get install --reinstall pkg-config cmake-data
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python

Start the server:

python server.py

Download your model and enjoy :)

My tests were done on a dedicated install of Ubuntu running on Win 11 WSL2 on a Samsung Galaxy Book2 (Intel i7 processor with integrated graphics and 16GB of RAM): llama2 13B (32 layers to GPU) loads fast and runs above 2 tokens per seconds which is acceptable for personal use.
NB: don't forget to increase the maximum amount of RAM WSL2 can use.

Hope it helps,
Cyrille

oobabooga · 2023-12-07T00:09:51Z

@cyrillebeauchamp thanks for the guide, that's extremely valuable. I think that with your commands it may be possible to automatically compile llama-cpp-python for Intel GPUs using GitHub Actions. Then we could add these to dedicated requirements_intel.txt and requirements_intel_noavx2.txt files.

If you are interested, you may be able to create a new .yml in this repository (where all the wheels here are compiled) and PR it to @jllllll:

https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/tree/main/.github/workflows

NineMeowICT · 2023-12-09T02:59:03Z

@cyrillebeauchamp Thanks for your detailed instruction! But I still have trouble loading the LLM.

In the beginning, I thought that the problem may be caused by the Intel iGPU which can also be used as a Level-Zero XPU device. So I export a environment variable "ONEAPI_DEVICE_SELECTOR=level_zero:0". But the error still happened.

By the way, the GPU IMC usage and the RAM usage increased quickly after I click the "Load" button.

Any ideas? Thank you!

itlackey · 2023-12-11T13:52:23Z

Using llama.cpp for Intel Arc support will result in most operations running on the CPU currently.

PyTorch extensions work well on Intel Arc. FastChat uses it and it's significantly faster than running llama.cpp.

There is work being done to support better drivers in llama.cpp but as of right now the OpenCL implementation runs primarily on the CPU. This is why we see 2 t/s using llama.cpp and around 20 t/s using FastChat on Arc GPUs.

Jacoby1218 · 2023-12-14T12:58:04Z

https://github.com/intel/intel-extension-for-pytorch/releases/tag/v2.1.10%2Bxpu new version of IPEX, native windows support has been added

Jacoby1218 · 2023-12-15T21:27:01Z

  File "D:\oobabooga_windows\oobabooga_windows\text-generation-webui\modules\ui_model_menu.py", line 209, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name, loader)
  File "D:\oobabooga_windows\oobabooga_windows\text-generation-webui\modules\models.py", line 88, in load_model
    output = load_func_map[loader](model_name)
  File "D:\oobabooga_windows\oobabooga_windows\text-generation-webui\modules\models.py", line 238, in huggingface_loader
    model = LoaderClass.from_pretrained(path_to_model, **params)
  File "D:\oobabooga_windows\oobabooga_windows\text-generation-webui\installer_files\env\lib\site-packages\transformers\models\auto\auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
  File "D:\oobabooga_windows\oobabooga_windows\text-generation-webui\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 3480, in from_pretrained
    ) = cls._load_pretrained_model(
  File "D:\oobabooga_windows\oobabooga_windows\text-generation-webui\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 3870, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "D:\oobabooga_windows\oobabooga_windows\text-generation-webui\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 743, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "D:\oobabooga_windows\oobabooga_windows\text-generation-webui\installer_files\env\lib\site-packages\accelerate\utils\modeling.py", line 317, in set_module_tensor_to_device
    new_value = value.to(device)
  File "D:\oobabooga_windows\oobabooga_windows\text-generation-webui\installer_files\env\lib\site-packages\torch\cuda\__init__.py", line 289, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled```

ghost · 2023-12-30T09:42:57Z

I'm not sure if this is the right place to post this. I receive the below error after installing OobaBooga using the default Arc install option on Windows. The install seemed to go well but running it results in the below DLL load error. Other threads that mentioned this loading error suggested it might be a PATH issue. I tried adding a few paths to the OS environment but couldn't resolve it. Any suggestions?

It's an Arc A770 on Windows 10.

Traceback (most recent call last) ─────────────────────────────────────────┐
│ C:\text-generation-webui\server.py:6 in │
│ │
│ 5 │
│ > 6 import accelerate # This early import makes Intel GPUs happy │
│ 7 │
│ │
│ C:\text-generation-webui\installer_files\env\Lib\site-packages\accelerate_init_.py:3 in │
│ │
│ 2 │
│ > 3 from .accelerator import Accelerator │
│ 4 from .big_modeling import ( │
│ │
│ C:\text-generation-webui\installer_files\env\Lib\site-packages\accelerate\accelerator.py:32 in │
│ │
│ 31 │
│ > 32 import torch │
│ 33 import torch.utils.hooks as hooks │
│ │
│ C:\text-generation-webui\installer_files\env\Lib\site-packages\torch_init_.py:139 in │
│ │
│ 138 err.strerror += f' Error loading "{dll}" or one of its dependencies.' │
│ > 139 raise err │
│ 140 │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
OSError: [WinError 126] The specified module could not be found. Error loading
"C:\text-generation-webui\installer_files\env\Lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its
dependencies.
Press any key to continue . . .

oobabooga · 2024-01-05T19:52:59Z

I think that the new loader below (gpt-fast) should work on Intel Arc, but I have no way of testing it. It has 4-bit and 8-bit support.

#5180

Any tests would be welcome.

ksullivan86 · 2024-01-20T01:45:15Z

I really don't know much as I am super new to AI, I am looking into getting more into AI mainly for Home Assistant and hopefully someday an assistant similar to Iron Man's Jarvis(Jarvis isn't possible today right?). Anyways I have a home server that I just built(14700k) and with the main use case as plex, I stumbled upon LocalAI last week (didn't even realize AI could be home hosted) I got excited about the above possibility and hosting AI would also help justify the cost of my server upgrade and running it 24/7.
I didn't put a GPU in my build because I wanted to keep idle power as low as possible and I just found out that intel ARC can idle at less than 1 watt on new intel systems so I am really interested in if intel ARC works also if it does what would need to be changed in the docker compose file? Also be helpful to have a CPU only docker compose file as well as I had to really do some searching to find out how install via docker with only CPU.
Is there anyway to use the integrated GPU in the 14700k?
Is an Intel ARC worth buying for AI right now? I was able to use my CPU with home assistant but it was super slow(probably over an minute to respond) and while just inside the webgui the response was almost instant, I guess its doing a lot more when it comes from home assistant...do you think and intel ARC would solve the problem, I really want to stay with ARC for the low idle power, I know battlemage is probably around 11 months away if it even gets released with the recent rumors so I dont think waiting for the v2 is a great idea considering its possible it never gets released(hopefully not and the fully compete with nvidia)

Leo512bit · 2024-02-04T07:01:17Z

It looks like llama.cpp now supports SYCL for Intel GPUs. Is Arc support now possible?

ggerganov/llama.cpp#2690

ElliottDyson · 2024-02-04T16:02:37Z

It looks like llama.cpp now supports SYCL for Intel GPUs. Is Arc support now possible?

ggerganov/llama.cpp#2690

This is brilliant news! Could I get a ping when it's been implemented into this repo please?

oobabooga · 2024-02-05T23:42:54Z

I am aware of SYCL and wanted to ask about this. There is also the Vulkan option now that may work on Intel Arc.

Can someone try these and see which one works with GPU offloading and is the fastest?

pip uninstall -y llama_cpp_python llama_cpp_python_cuda llama_cpp_python_cuda_tensorcores

CMAKE_ARGS="-DLLAMA_SYCL=on" pip install llama-cpp-python  # Option 1: SYCL
CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python  # Option 2: Vulkan
CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python  # Option 3: Kompute

Leo512bit · 2024-02-06T09:26:29Z

I am aware of SYCL and wanted to ask about this. There is also the Vulkan option now that may work on Intel Arc.

Can someone try these and see which one works with GPU offloading and is the fastest?
pip uninstall -y llama_cpp_python llama_cpp_python_cuda llama_cpp_python_cuda_tensorcores

CMAKE_ARGS="-DLLAMA_SYCL=on" pip install llama-cpp-python  # Option 1: SYCL
CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python  # Option 2: Vulkan
CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python  # Option 3: Kompute 

Stupid question: So I assume that since there is no CMake in WGU's codebase, I assume you are talking about building llama.cpp, so once I build it, do I "plug in" the compiled stuff into TGWUI or do I run a model on llama.cpp somehow? (My only understanding of llama.cpp is that it is some kind of backend.) Which ever of those things I've got to do how do I do it?

ElliottDyson · 2024-02-08T09:18:52Z

I am aware of SYCL and wanted to ask about this. There is also the Vulkan option now that may work on Intel Arc.

Can someone try these and see which one works with GPU offloading and is the fastest?
pip uninstall -y llama_cpp_python llama_cpp_python_cuda llama_cpp_python_cuda_tensorcores

CMAKE_ARGS="-DLLAMA_SYCL=on" pip install llama-cpp-python  # Option 1: SYCL
CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python  # Option 2: Vulkan
CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python  # Option 3: Kompute 

Unfortunately I have no idea how to build this myself.

However, if someone wants to send me one that I can download I am more than happy to test.

All I know is that the current installation procedure for Intel ARC doesn't utilise my Intel Arc A770 16GB, even when it downloads all the correct packages and says it's offloaded the layers to GPU (which I can tell it hasn't because I have no VRAM being used up).

Edit for Clarification: Doesn't work unless using Transformers and no quantisation.

ElliottDyson · 2024-02-09T18:47:38Z

There's also the following library which should allow us to use quantised versions of the models that are for hugging face's transformers library on intel GPUs, as currently this may be approach to getting quantised models running on these GPUs other than through Llama CPP:

https://github.com/huggingface/optimum-intel

ElliottDyson · 2024-02-15T23:22:40Z

I am aware of SYCL and wanted to ask about this. There is also the Vulkan option now that may work on Intel Arc.
Can someone try these and see which one works with GPU offloading and is the fastest?
pip uninstall -y llama_cpp_python llama_cpp_python_cuda llama_cpp_python_cuda_tensorcores

CMAKE_ARGS="-DLLAMA_SYCL=on" pip install llama-cpp-python  # Option 1: SYCL
CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python  # Option 2: Vulkan
CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python  # Option 3: Kompute 
Unfortunately I have no idea how to build this myself.

However, if someone wants to send me one that I can download I am more than happy to test.

All I know is that the current installation procedure for Intel ARC doesn't utilise my Intel Arc A770 16GB, even when it downloads all the correct packages and says it's offloaded the layers to GPU (which I can tell it hasn't because I have no VRAM being used up).

Edit for Clarification: Doesn't work unless using Transformers and no quantisation.

I hope this isn't disruptive to your work, but any updates on some wheels for us to test with? I've tried building them personally but to no success @oobabooga

NineMeowICT · 2024-03-10T02:26:20Z

@oobabooga Thanks!
I have tried all the backends you mentioned above. And this is my conclusion:
Using Vulkan is almost as efficient as using SYCL at the present stage. LLaMA-cpp-python using Kompute can not be built due to the following reason.

NineMeowICT · 2024-03-10T03:11:34Z

Here is my solution using llama-cpp-python:

Test environment: Linux Mint 21.3 Cinnamon with Linux kernel 6.5.0-25

Python version: 3.10.* or 3.11.*

Before all steps, install the Intel Driver(Thanks @cyrillebeauchamp):

# download the key to system keyring
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
sudo gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg

# add signed entry to apt sources and configure the APT client to use Intel repository:
echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy client" | \
sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list

#update the repositories and install the base kit:
sudo apt update
sudo apt install -y \
  intel-opencl-icd intel-level-zero-gpu level-zero \
  intel-media-va-driver-non-free libmfx1 libmfxgen1 libvpl2 \
  libegl-mesa0 libegl1-mesa libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
  libglapi-mesa libgles2-mesa-dev libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \
  mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo hwinfo clinfo

Method 1: Vulkan

Step 1: Install Vulkan SDK

wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo apt-key add -
sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list
sudo apt update -y
sudo apt-get install -y vulkan-sdk
# To verify the installation, use the command below:
vulkaninfo

Step 2: Install necessary python packages

pip install -r requirements_nowheels.txt

Step 3: Build and install the wheel

CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python

Step 4: Launch and enjoy it!

python server.py

Method 2: SYCL

Step 1: Install Intel® oneAPI Base Toolkit
Please select a way that suits you and follow the instructions: https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html

Step 2: Install necessary python packages

pip install -r requirements_nowheels.txt

Step 3: Install pytorch and IPEX

pip install torch==2.1.0a0 torchvision==0.16.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

REMINDER: If you are using oneAPI 2024.1.0 or newer, please run the following command instead:

python -m pip install torch==2.1.0.post0 torchvision==0.16.0.post0 torchaudio==2.1.0.post0 intel-extension-for-pytorch==2.1.20+xpu oneccl_bind_pt==2.1.200+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

Step 4: Build and install the wheel

source /opt/intel/oneapi/setvars.sh   
CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_SYCL_F16=ON" pip install llama-cpp-python

Step 4: Launch and enjoy it!

python server.py

NineMeowICT · 2024-03-10T03:17:45Z

GPU: Intel ARC A770 16GB
Model: LLaMA2-13B-Tiefighter.Q8_0.gguf

Test Result:
SYCL

output.mp4

Vulkan

ksullivan86 · 2024-03-10T07:57:28Z

@cyrillebeauchamp Thanks for your detailed instruction! But I still have trouble loading the LLM.

In the beginning, I thought that the problem may be caused by the Intel iGPU which can also be used as a Level-Zero XPU device. So I export a environment variable "ONEAPI_DEVICE_SELECTOR=level_zero:0". But the error still happened.

By the way, the GPU IMC usage and the RAM usage increased quickly after I click the "Load" button.

Any ideas? Thank you!

idk what gpu you are using but do you know what your arc card is drawing at idle? I have been trying to see if its even possible to get intel acr aspm down to under 1 watt of power like in windows, Im using unraid and had to use a custom kernel so idk if that is any reason why I cant get the low idle to work. My 770 is constantly drawing 40w, and I only went with ARC because of the extremely low idle. Have you had any luck?

NineMeowICT · 2024-03-10T09:21:00Z

@ksullivan86 I use A770 too and its idle power is about 40w too. I tried to use ASPM to lower it but it didn't work on both Windows and Linux. I suspect that the power management when some of ARC graphic cards are idle is not implemented at present. There may be differences in cards between different AIC manufacturers

opticblu · 2024-03-16T01:45:10Z

FYI this works in WSL2 Windows 11 A770 16g, make sure to disable iGPU (I just did it in bios instead of device manager to be sure)

Did it with syctl

Truncated quote from @NineMeowICT above, for brevity

Here is my solution using llama-cpp-python:
....

Step 4: Launch and enjoy it!
python server.py

With iGPU enabled it didn't work, works great with A770 only

Thanks for putting this together @NineMeowICT and others

sambartik · 2024-04-12T14:23:22Z

Thanks for the write-up @NineMeowICT, nice one!

Since then something has probably changed and doing the SYCL method did not entirely work. I received an error that is described in this issue: pytorch/pytorch#123097

Installing these versions of packages instead solved the issue:

python -m pip install torch==2.1.0.post0 torchvision==0.16.0.post0 torchaudio==2.1.0.post0 intel-extension-for-pytorch==2.1.20+xpu oneccl_bind_pt==2.1.200+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

NineMeowICT · 2024-04-12T22:55:19Z

@sambartik Yes, you're right. I also encountered this so I downgrade oneMKL. I will edited my post to keep up.

Thanks!

github-actions · 2024-06-13T23:16:45Z

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

LovelyA72 · 2024-06-24T00:16:57Z

sudo apt-get install --reinstall pkg-config cmake-data
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python

If this method no longer works after updating to intel toolchain 202x.x(your version installed), you may need to manually install intel-oneapi-compiler-dpcpp-cpp-202x.x

linus378 added the enhancement New feature or request label Apr 26, 2023

abhilash1910 mentioned this issue Oct 21, 2023

Intel Gpu support initialization #4340

Merged

1 task

FirestarDrive mentioned this issue May 12, 2024

Intel Arc support Atinoda/text-generation-webui-docker#38

Open

github-actions bot added the stale label Jun 13, 2024

github-actions bot closed this as completed Jun 13, 2024

cm8t mentioned this issue Jul 10, 2024

Broken on Metal: why is ooba attempting to import llama_cpp_cuda ? #6222

Closed

1 task

Intel ARC Support #1575

Intel ARC Support #1575

Comments

linus378 commented Apr 26, 2023

linus378 commented Apr 26, 2023

dan9070 commented Apr 27, 2023

mmccool commented May 2, 2023 • edited Loading

mmccool commented May 2, 2023 • edited Loading

rattlecanblack commented May 11, 2023

miraged3 commented Aug 14, 2023

itlackey commented Aug 30, 2023

oobabooga commented Aug 30, 2023

itlackey commented Sep 1, 2023

abhilash1910 commented Sep 8, 2023

oobabooga commented Sep 23, 2023

itlackey commented Sep 23, 2023

oobabooga commented Sep 23, 2023

itlackey commented Sep 23, 2023

abhilash1910 commented Sep 24, 2023

oobabooga commented Sep 24, 2023

Yorizuka commented Sep 30, 2023 • edited Loading

Yorizuka commented Oct 8, 2023

Th-Underscore commented Oct 12, 2023

Yorizuka commented Oct 15, 2023

oobabooga commented Oct 15, 2023

Yorizuka commented Oct 16, 2023

Th-Underscore commented Oct 16, 2023 • edited Loading

Yorizuka commented Oct 16, 2023 • edited Loading

oobabooga commented Oct 16, 2023

Th-Underscore commented Oct 17, 2023 • edited Loading

i6od commented Oct 20, 2023

abhilash1910 commented Oct 21, 2023

cyrillebeauchamp commented Dec 6, 2023

oobabooga commented Dec 7, 2023 • edited Loading

NineMeowICT commented Dec 9, 2023 • edited Loading

itlackey commented Dec 11, 2023 • edited Loading

Jacoby1218 commented Dec 14, 2023

Jacoby1218 commented Dec 15, 2023

ghost commented Dec 30, 2023

oobabooga commented Jan 5, 2024

ksullivan86 commented Jan 20, 2024

Leo512bit commented Feb 4, 2024

ElliottDyson commented Feb 4, 2024

oobabooga commented Feb 5, 2024

Leo512bit commented Feb 6, 2024

ElliottDyson commented Feb 8, 2024 • edited Loading

ElliottDyson commented Feb 9, 2024

ElliottDyson commented Feb 15, 2024 • edited Loading

NineMeowICT commented Mar 10, 2024

NineMeowICT commented Mar 10, 2024 • edited Loading

NineMeowICT commented Mar 10, 2024 • edited Loading

ksullivan86 commented Mar 10, 2024

NineMeowICT commented Mar 10, 2024

opticblu commented Mar 16, 2024 • edited Loading

sambartik commented Apr 12, 2024

NineMeowICT commented Apr 12, 2024

github-actions bot commented Jun 13, 2024

LovelyA72 commented Jun 24, 2024 • edited Loading

mmccool commented May 2, 2023 •

edited

Loading

mmccool commented May 2, 2023 •

edited

Loading

Yorizuka commented Sep 30, 2023 •

edited

Loading

Th-Underscore commented Oct 16, 2023 •

edited

Loading

Yorizuka commented Oct 16, 2023 •

edited

Loading

Th-Underscore commented Oct 17, 2023 •

edited

Loading

oobabooga commented Dec 7, 2023 •

edited

Loading

NineMeowICT commented Dec 9, 2023 •

edited

Loading

itlackey commented Dec 11, 2023 •

edited

Loading

ElliottDyson commented Feb 8, 2024 •

edited

Loading

ElliottDyson commented Feb 15, 2024 •

edited

Loading

NineMeowICT commented Mar 10, 2024 •

edited

Loading

NineMeowICT commented Mar 10, 2024 •

edited

Loading

opticblu commented Mar 16, 2024 •

edited

Loading

LovelyA72 commented Jun 24, 2024 •

edited

Loading