[BUG]AssertionError: libcuda.so cannot found! #949

ArlanCooper · 2024-01-12T03:01:37Z

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

运行脚本:

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("/data/share/rwq/Qwen-7B-Chat", trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained("/data/share/rwq/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()
response, history = model.chat(tokenizer, 'hello', history=None)

运行上述脚本，已经安装flash-attention
报错：

File <string>:63, in rotary_kernel(OUT, X, COS, SIN, CU_SEQLENS, SEQLEN_OFFSETS, seqlen, nheads, rotary_dim, seqlen_ro, CACHE_KEY_SEQLEN, stride_out_batch, stride_out_seqlen, stride_out_nheads, stride_out_headdim, stride_x_batch, stride_x_seqlen, stride_x_nheads, stride_x_headdim, BLOCK_K, IS_SEQLEN_OFFSETS_TENSOR, IS_VARLEN, INTERLEAVED, CONJUGATE, BLOCK_M, grid, num_warps, num_stages, extern_libs, stream, warmup, device, device_type)

File ~/work/conda/envs/flash_attn/lib/python3.10/site-packages/triton/compiler/compiler.py:425, in compile(fn, **kwargs)
    423 # cache manager
    424 if is_cuda or is_hip:
--> 425     so_path = make_stub(name, signature, constants)
    426 else:
    427     so_path = _device_backend.make_launcher_stub(name, signature, constants)

File ~/work/conda/envs/flash_attn/lib/python3.10/site-packages/triton/compiler/make_launcher.py:39, in make_stub(name, signature, constants)
     37 with open(src_path, "w") as f:
     38     f.write(src)
---> 39 so = _build(name, src_path, tmpdir)
     40 with open(so, "rb") as f:
     41     return so_cache_manager.put(f.read(), so_name, binary=True)

File ~/work/conda/envs/flash_attn/lib/python3.10/site-packages/triton/common/build.py:61, in _build(name, src, srcdir)
     59     hip_include_dir = os.path.join(rocm_path_dir(), "include")
     60 else:
---> 61     cuda_lib_dirs = libcuda_dirs()
     62     cu_include_dir = cuda_include_dir()
     63 suffix = sysconfig.get_config_var('EXT_SUFFIX')

File ~/work/conda/envs/flash_attn/lib/python3.10/site-packages/triton/common/build.py:30, in libcuda_dirs()
     28     msg += 'Possible files are located at %s.' % str(locs)
     29     msg += 'Please create a symlink of libcuda.so to any of the file.'
---> 30 assert any(os.path.exists(os.path.join(path, 'libcuda.so')) for path in dirs), msg
     31 return dirs

AssertionError: libcuda.so cannot found!

期望行为 | Expected Behavior

期望不报错

复现方法 | Steps To Reproduce

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("/data/share/rwq/Qwen-7B-Chat", trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained("/data/share/rwq/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()
response, history = model.chat(tokenizer, 'hello', history=None)

运行环境 | Environment

- OS:ubantu20.04
- Python:3.10.12
- Transformers:4.33.2
- PyTorch: 2.1.0
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):11.8

备注 | Anything else?

No response

The text was updated successfully, but these errors were encountered:

jklj077 · 2024-01-12T06:46:44Z

根据提示，triton找不到cuda库，请检查您的环境是否正常(where nvcc && nvcc -V)。

ArlanCooper · 2024-01-12T07:04:53Z

根据提示，triton找不到cuda库，请检查您的环境是否正常(where nvcc && nvcc -V)。

nvcc -V是正常的:

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

jklj077 · 2024-01-12T07:21:34Z

嗯，系统正常的话，可能是triton跟特定环境的兼容性问题，我们检索到了一个类似的issue，建议您参考看看:

trition-2.1.0 does not work in collab triton-lang/triton#2507

ArlanCooper · 2024-01-12T08:31:57Z

嗯，系统正常的话，可能是triton跟特定环境的兼容性问题，我们检索到了一个类似的issue，建议您参考看看:

trition-2.1.0 does not work in collab openai/triton#2507

我更新了torch版本到2.1.2，现在又报错:


Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/powerop/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 1200, in chat
    outputs = self.generate(
  File "/home/powerop/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 1319, in generate
    return super().generate(
  File "/home/powerop/work/conda/envs/deepspeed/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/powerop/work/conda/envs/deepspeed/lib/python3.10/site-packages/transformers/generation/utils.py", line 1764, in generate
    return self.sample(
  File "/home/powerop/work/conda/envs/deepspeed/lib/python3.10/site-packages/transformers/generation/utils.py", line 2897, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

我看了一下历史的issue，发现调整temprature会出现这个状况，但是，我这边没有调整任何参数，代码还是原来的:

model_name = "/data/share/rwq/Qwen-7B-Chat"
payload = "你好"
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)

ds_model = deepspeed.init_inference(
    model=model,      # Transformers模型
    mp_size=1,        # GPU数量
    dtype=torch.float16, # 权重类型(fp16)
    replace_method="auto", # 让DS自动替换层
    replace_with_kernel_inject=True, # 使用kernel injector替换
)
print(f"模型加载至设备{ds_model.module.device}\n")
#assert isinstance(ds_model.module.transformer.h[0], DeepSpeedTransformerInference) == True, "Model not sucessfully initalized"

请问是什么原因呢？

jklj077 · 2024-01-12T08:46:19Z

请避免使用pytorch=2.1.2 (引用的triton的issue的解决方案是修改triton代码)
未调temperature，而且是只有在多卡出现RuntimeError: probability tensor contains either `inf`, `nan` or element < 0，请参考[BUG] RuntimeError: probability tensor contains either inf, nan or element < 0 #848，有很大概率是驱动耦合环境问题
ds_model有没有影响我们不清楚，如果这部分有问题，请您自查

ArlanCooper · 2024-01-12T09:20:39Z

请避免使用pytorch=2.1.2 (引用的triton的issue的解决方案是修改triton代码)

未调temperature，而且是只有在多卡出现RuntimeError: probability tensor contains either `inf`, `nan` or element < 0，请参考[BUG] RuntimeError: probability tensor contains either inf, nan or element < 0 #848，有很大概率是驱动耦合环境问题

ds_model有没有影响我们不清楚，如果这部分有问题，请您自查

好的，我再降回原来的版本；
我看了看，其中的解决方案要不是降版本，或者是通过ldconfig映射对应关系；好像没看到怎么修改triton代码的解决方案，您这边指的是哪一个呢？
ds_model我这边使用chatglm3-6b是没问题的，可以直接运行成功的~

jklj077 · 2024-01-15T03:31:21Z

这个commit: triton-lang/triton@0e7b97b

ArlanCooper · 2024-01-15T07:34:41Z

这个commit: openai/triton@0e7b97b
好滴，谢谢，我试一下

ArlanCooper · 2024-02-01T01:34:28Z

这个commit: openai/triton@0e7b97b
验证没问题，谢谢

ArlanCooper closed this as completed Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]AssertionError: libcuda.so cannot found! #949

[BUG]AssertionError: libcuda.so cannot found! #949

ArlanCooper commented Jan 12, 2024 •

edited

Loading

jklj077 commented Jan 12, 2024 •

edited

Loading

ArlanCooper commented Jan 12, 2024

jklj077 commented Jan 12, 2024

ArlanCooper commented Jan 12, 2024

jklj077 commented Jan 12, 2024

ArlanCooper commented Jan 12, 2024

jklj077 commented Jan 15, 2024

ArlanCooper commented Jan 15, 2024

ArlanCooper commented Feb 1, 2024

[BUG]AssertionError: libcuda.so cannot found! #949

[BUG]AssertionError: libcuda.so cannot found! #949

Comments

ArlanCooper commented Jan 12, 2024 • edited Loading

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

jklj077 commented Jan 12, 2024 • edited Loading

ArlanCooper commented Jan 12, 2024

jklj077 commented Jan 12, 2024

ArlanCooper commented Jan 12, 2024

jklj077 commented Jan 12, 2024

ArlanCooper commented Jan 12, 2024

jklj077 commented Jan 15, 2024

ArlanCooper commented Jan 15, 2024

ArlanCooper commented Feb 1, 2024

ArlanCooper commented Jan 12, 2024 •

edited

Loading

jklj077 commented Jan 12, 2024 •

edited

Loading