Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7900 XTX: Error invalid device function at line 679 in file /bitsandbytes/csrc/ops.hip #29

Open
PatchouliPatch opened this issue May 14, 2024 · 9 comments
Assignees

Comments

@PatchouliPatch
Copy link

PatchouliPatch commented May 14, 2024

System Info

Kernel: 6.5.0-28-generic

Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy

GPU: Sapphire Pulse RX 7900 XTX
ROCm Version: 6.0.2
CPU: Ryzen 7 7700X
Motherboard: Gigabyte Aorus Elite AX B650 (BIOS: F24c)
Torch version: torch==2.3.0+rocm6.0
Python version: 3.10.14

Reproduction

I'm on the rocm_enabled branch. Attempting to compile the ROCm 6.2 testing branch results in errors. Running the following code results in this error:

# Huggingface Transformers

model_id = "microsoft/Phi-3-mini-128k-instruct"
bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,
    #load_in_4bit=True,
    #bnb_4bit_quant_type="nf4",
    #bnb_4bit_compute_dtype=torch.float16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="cuda",
    trust_remote_code = True,
    attn_implementation="eager",
    quantization_config = bnb_config # un-comment to quantize your model; Only supports Nvidia GPUs
)

image

attached here is ops.hip:
ops.hip.zip

Expected behavior

after running that piece of code, I get the following error:

Error invalid device function at line 679 in file /home/$USER/bitsandbytes/csrc/ops.hip.

Nothing else prints to my terminal.

@pnunna93 pnunna93 self-assigned this May 16, 2024
@pnunna93
Copy link
Collaborator

Hi @PatchouliPatch , I need more details to review this. Could you please run the script with AMD_LOG_LEVEL=3 and share its output?

AMD_LOG_LEVEL=3 HIP_VISIBLE_DEVICES=0 python3 check_for_possibility.py

Please also share outputs of 'rocminfo' and 'hipconfig --version'

@PatchouliPatch
Copy link
Author

PatchouliPatch commented May 19, 2024

Here's the terminal output with AMD_LOG_LEVEL:
log_level3.txt

rocminfo:
rocminfo.txt

hipconfig --version: 6.0.32831-204d35d16

I know that we were advised to disable the iGPU on the CPU, but for some reason Gigabyte's BIOS fails to do so even if I tell it to disable

@pnunna93
Copy link
Collaborator

Could you try with rocm 6.1? You can use rocm/pytorch:latest docker.

If you have to use 6.0, please try with rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2

Please make sure to select gfx1100 gpu with the container.

@PatchouliPatch
Copy link
Author

Alright, will try it out. I built my previous one with gfx1101. What's the Navi 31 GFX name supposed to be anyways? Is it 1100?

@PatchouliPatch
Copy link
Author

Gave it a try today.

I installed the latest ROCm version of 6.1.1 after uninstalling 6.0.2.

I pulled the latest version of the repo and did the following:

git checkout rocm_enabled
git pull
pip install -r requirements-dev.txt
cmake -DCOMPUTE_BACKEND=hip -S . -DBNB_ROCM_ARCH="gfx1100" -DCMAKE_HIP_COMPILER=/opt/rocm-6.1.1/llvm/bin/clang++
make
pip install .

the program compiled but gave me warnings.

after rerunning it with the same python script, it seems to still give the same errors.
image

here's the output when I run AMD_LOG_LEVEL=3 now:
log_level3_new.txt

and here's rocminfo:
rocminfo_new.txt

hipconfig version: 6.1.40092-038397aaa

@pnunna93
Copy link
Collaborator

Please set HSA_OVERRIDE_GFX_VERSION=11.0.0 and retry. Its an environment variable, you can export or set it while running the script. It will target to gfx1100 architecture.

@vasicvuk
Copy link

I had the same error, adding HSA_OVERRIDE_GFX_VERSION=11.0.0 seems to fix it but unforcenetly now i get:

rocblaslt warning: No paths matched /opt/rocm/lib/hipblaslt/library/*gfx1100*co. Make sure that HIPBLASLT_TENSILE_LIBPATH is set correctly.
A: torch.Size([1984, 3200]), B: torch.Size([3200, 3200]), C: (1984, 3200); (lda, ldb, ldc): (c_int(1984), c_int(3200), c_int(1984)); (m, n, k): (c_int(1984), c_int(3200), c_int(3200))
error detectedTraceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/Projects/axolotl/src/axolotl/cli/train.py", line 70, in <module>
    fire.Fire(do_cli)
  File "/opt/Projects/venv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/Projects/venv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/Projects/venv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/opt/Projects/axolotl/src/axolotl/cli/train.py", line 38, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/opt/Projects/axolotl/src/axolotl/cli/train.py", line 66, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/opt/Projects/axolotl/src/axolotl/train.py", line 170, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/opt/Projects/venv/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
    return inner_training_loop(
  File "/opt/Projects/venv/lib/python3.10/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/transformers/trainer.py", line 3238, in training_step
    loss = self.compute_loss(model, inputs)
  File "/opt/Projects/axolotl/src/axolotl/core/trainer_builder.py", line 539, in compute_loss
    return super().compute_loss(model, inputs, return_outputs=return_outputs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/transformers/trainer.py", line 3264, in compute_loss
    outputs = model(**inputs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/peft/peft_model.py", line 1430, in forward
    return self.base_model(
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 179, in forward
    return self.model.forward(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1164, in forward
    outputs = self.model(
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/Projects/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 814, in llama_model_forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 451, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 230, in forward
    outputs = run_function(*args)
  File "/opt/Projects/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 808, in custom_forward
    return module(
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/Projects/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 907, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/Projects/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 422, in flashattn_forward
    query_states = self.q_proj(hidden_states)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/peft/tuners/lora/bnb.py", line 217, in forward
    result = self.base_layer(x, *args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/Projects/venv/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 801, in forward
    out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
  File "/opt/Projects/venv/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 559, in matmul
    return MatMul8bitLt.apply(A, B, out, bias, state)
  File "/opt/Projects/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/opt/Projects/venv/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 398, in forward
    out32, Sout32 = F.igemmlt(C32A, state.CxB, SA, state.SB)
  File "/opt/Projects/venv/lib/python3.10/site-packages/bitsandbytes/functional.py", line 2388, in igemmlt
    raise Exception("cublasLt ran into an error!")
Exception: cublasLt ran into an error!

@PatchouliPatch
Copy link
Author

I suggest moving over to the alpha test of the actual Bitsandbytes library. You can use the multi_backend_refactor branch. It works on my 7900 XTX

@farshadghodsian
Copy link

farshadghodsian commented Jul 5, 2024

I must of missed this issue as I opened a seperate issue to report a similar issue on my Radeon Pro W7900 (also gfx1100) with loading the model in 8-bit. It should be noted that while trying to load a model in 8-bit is not working with bitsandbytes on Radeon GPUs I did get it to work loading models in 4-bit. Not sure if this will help your use-case but using load_in_4bit=True instead of load_in_8bit=True worked for me.

Note that as of newer versions of PyTorch there is an upstream issue with PyTorch force loading HIPBLASLT for all AMD GPUS which is not supported on Radeon GPUs. You will also need to set TORCH_BLAS_PREFER_HIPBLASLT=0 for it to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants