-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
7900 XTX: Error invalid device function at line 679 in file /bitsandbytes/csrc/ops.hip #29
Comments
Hi @PatchouliPatch , I need more details to review this. Could you please run the script with AMD_LOG_LEVEL=3 and share its output? AMD_LOG_LEVEL=3 HIP_VISIBLE_DEVICES=0 python3 check_for_possibility.py Please also share outputs of 'rocminfo' and 'hipconfig --version' |
Here's the terminal output with AMD_LOG_LEVEL: rocminfo: hipconfig --version: 6.0.32831-204d35d16 I know that we were advised to disable the iGPU on the CPU, but for some reason Gigabyte's BIOS fails to do so even if I tell it to disable |
Could you try with rocm 6.1? You can use rocm/pytorch:latest docker. If you have to use 6.0, please try with rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2 Please make sure to select gfx1100 gpu with the container. |
Alright, will try it out. I built my previous one with gfx1101. What's the Navi 31 GFX name supposed to be anyways? Is it 1100? |
Gave it a try today. I installed the latest ROCm version of 6.1.1 after uninstalling 6.0.2. I pulled the latest version of the repo and did the following:
the program compiled but gave me warnings. after rerunning it with the same python script, it seems to still give the same errors. here's the output when I run AMD_LOG_LEVEL=3 now: and here's rocminfo: hipconfig version: 6.1.40092-038397aaa |
Please set HSA_OVERRIDE_GFX_VERSION=11.0.0 and retry. Its an environment variable, you can export or set it while running the script. It will target to gfx1100 architecture. |
I had the same error, adding HSA_OVERRIDE_GFX_VERSION=11.0.0 seems to fix it but unforcenetly now i get:
|
I suggest moving over to the alpha test of the actual Bitsandbytes library. You can use the multi_backend_refactor branch. It works on my 7900 XTX |
I must of missed this issue as I opened a seperate issue to report a similar issue on my Radeon Pro W7900 (also gfx1100) with loading the model in 8-bit. It should be noted that while trying to load a model in 8-bit is not working with bitsandbytes on Radeon GPUs I did get it to work loading models in 4-bit. Not sure if this will help your use-case but using Note that as of newer versions of PyTorch there is an upstream issue with PyTorch force loading HIPBLASLT for all AMD GPUS which is not supported on Radeon GPUs. You will also need to set |
System Info
Kernel: 6.5.0-28-generic
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy
GPU: Sapphire Pulse RX 7900 XTX
ROCm Version: 6.0.2
CPU: Ryzen 7 7700X
Motherboard: Gigabyte Aorus Elite AX B650 (BIOS: F24c)
Torch version: torch==2.3.0+rocm6.0
Python version: 3.10.14
Reproduction
I'm on the rocm_enabled branch. Attempting to compile the ROCm 6.2 testing branch results in errors. Running the following code results in this error:
attached here is ops.hip:
ops.hip.zip
Expected behavior
after running that piece of code, I get the following error:
Error invalid device function at line 679 in file /home/$USER/bitsandbytes/csrc/ops.hip.
Nothing else prints to my terminal.
The text was updated successfully, but these errors were encountered: