Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]AssertionError: libcuda.so cannot found! #949

Closed
2 tasks done
ArlanCooper opened this issue Jan 12, 2024 · 9 comments
Closed
2 tasks done

[BUG]AssertionError: libcuda.so cannot found! #949

ArlanCooper opened this issue Jan 12, 2024 · 9 comments

Comments

@ArlanCooper
Copy link

ArlanCooper commented Jan 12, 2024

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

运行脚本:

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("/data/share/rwq/Qwen-7B-Chat", trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained("/data/share/rwq/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()
response, history = model.chat(tokenizer, 'hello', history=None)

运行上述脚本,已经安装flash-attention
报错:

File <string>:63, in rotary_kernel(OUT, X, COS, SIN, CU_SEQLENS, SEQLEN_OFFSETS, seqlen, nheads, rotary_dim, seqlen_ro, CACHE_KEY_SEQLEN, stride_out_batch, stride_out_seqlen, stride_out_nheads, stride_out_headdim, stride_x_batch, stride_x_seqlen, stride_x_nheads, stride_x_headdim, BLOCK_K, IS_SEQLEN_OFFSETS_TENSOR, IS_VARLEN, INTERLEAVED, CONJUGATE, BLOCK_M, grid, num_warps, num_stages, extern_libs, stream, warmup, device, device_type)

File ~/work/conda/envs/flash_attn/lib/python3.10/site-packages/triton/compiler/compiler.py:425, in compile(fn, **kwargs)
    423 # cache manager
    424 if is_cuda or is_hip:
--> 425     so_path = make_stub(name, signature, constants)
    426 else:
    427     so_path = _device_backend.make_launcher_stub(name, signature, constants)

File ~/work/conda/envs/flash_attn/lib/python3.10/site-packages/triton/compiler/make_launcher.py:39, in make_stub(name, signature, constants)
     37 with open(src_path, "w") as f:
     38     f.write(src)
---> 39 so = _build(name, src_path, tmpdir)
     40 with open(so, "rb") as f:
     41     return so_cache_manager.put(f.read(), so_name, binary=True)

File ~/work/conda/envs/flash_attn/lib/python3.10/site-packages/triton/common/build.py:61, in _build(name, src, srcdir)
     59     hip_include_dir = os.path.join(rocm_path_dir(), "include")
     60 else:
---> 61     cuda_lib_dirs = libcuda_dirs()
     62     cu_include_dir = cuda_include_dir()
     63 suffix = sysconfig.get_config_var('EXT_SUFFIX')

File ~/work/conda/envs/flash_attn/lib/python3.10/site-packages/triton/common/build.py:30, in libcuda_dirs()
     28     msg += 'Possible files are located at %s.' % str(locs)
     29     msg += 'Please create a symlink of libcuda.so to any of the file.'
---> 30 assert any(os.path.exists(os.path.join(path, 'libcuda.so')) for path in dirs), msg
     31 return dirs

AssertionError: libcuda.so cannot found!


期望行为 | Expected Behavior

期望不报错

复现方法 | Steps To Reproduce

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("/data/share/rwq/Qwen-7B-Chat", trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B-Chat", device_map="cpu", trust_remote_code=True).eval()
# use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained("/data/share/rwq/Qwen-7B-Chat", device_map="auto", trust_remote_code=True).eval()
response, history = model.chat(tokenizer, 'hello', history=None)

运行环境 | Environment

- OS:ubantu20.04
- Python:3.10.12
- Transformers:4.33.2
- PyTorch: 2.1.0
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):11.8

备注 | Anything else?

No response

@jklj077
Copy link
Contributor

jklj077 commented Jan 12, 2024

根据提示,triton找不到cuda库,请检查您的环境是否正常(where nvcc && nvcc -V)。

@ArlanCooper
Copy link
Author

根据提示,triton找不到cuda库,请检查您的环境是否正常(where nvcc && nvcc -V)。

nvcc -V是正常的:

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

@jklj077
Copy link
Contributor

jklj077 commented Jan 12, 2024

嗯,系统正常的话,可能是triton跟特定环境的兼容性问题,我们检索到了一个类似的issue,建议您参考看看:

@ArlanCooper
Copy link
Author

嗯,系统正常的话,可能是triton跟特定环境的兼容性问题,我们检索到了一个类似的issue,建议您参考看看:

我更新了torch版本到2.1.2,现在又报错:


Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/powerop/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 1200, in chat
    outputs = self.generate(
  File "/home/powerop/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 1319, in generate
    return super().generate(
  File "/home/powerop/work/conda/envs/deepspeed/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/powerop/work/conda/envs/deepspeed/lib/python3.10/site-packages/transformers/generation/utils.py", line 1764, in generate
    return self.sample(
  File "/home/powerop/work/conda/envs/deepspeed/lib/python3.10/site-packages/transformers/generation/utils.py", line 2897, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0


我看了一下历史的issue,发现调整temprature会出现这个状况,但是,我这边没有调整任何参数,代码还是原来的:

model_name = "/data/share/rwq/Qwen-7B-Chat"
payload = "你好"
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)

ds_model = deepspeed.init_inference(
    model=model,      # Transformers模型
    mp_size=1,        # GPU数量
    dtype=torch.float16, # 权重类型(fp16)
    replace_method="auto", # 让DS自动替换层
    replace_with_kernel_inject=True, # 使用kernel injector替换
)
print(f"模型加载至设备{ds_model.module.device}\n")
#assert isinstance(ds_model.module.transformer.h[0], DeepSpeedTransformerInference) == True, "Model not sucessfully initalized"


请问是什么原因呢?

@jklj077
Copy link
Contributor

jklj077 commented Jan 12, 2024

  • 请避免使用pytorch=2.1.2 (引用的triton的issue的解决方案是修改triton代码)
  • 未调temperature,而且是只有在多卡出现RuntimeError: probability tensor contains either `inf`, `nan` or element < 0,请参考[BUG] RuntimeError: probability tensor contains either inf, nan or element < 0 #848,有很大概率是驱动耦合环境问题
  • ds_model有没有影响我们不清楚,如果这部分有问题,请您自查

@ArlanCooper
Copy link
Author

  • 请避免使用pytorch=2.1.2 (引用的triton的issue的解决方案是修改triton代码)
  • 未调temperature,而且是只有在多卡出现RuntimeError: probability tensor contains either `inf`, `nan` or element < 0,请参考[BUG] RuntimeError: probability tensor contains either inf, nan or element < 0 #848,有很大概率是驱动耦合环境问题
  • ds_model有没有影响我们不清楚,如果这部分有问题,请您自查
  1. 好的,我再降回原来的版本;
  2. 我看了看,其中的解决方案要不是降版本,或者是通过ldconfig映射对应关系;好像没看到怎么修改triton代码的解决方案,您这边指的是哪一个呢?
  3. ds_model我这边使用chatglm3-6b是没问题的,可以直接运行成功的~

@jklj077
Copy link
Contributor

jklj077 commented Jan 15, 2024

image

这个commit: triton-lang/triton@0e7b97b

@ArlanCooper
Copy link
Author

image

这个commit: openai/triton@0e7b97b
好滴,谢谢,我试一下

@ArlanCooper
Copy link
Author

image

这个commit: openai/triton@0e7b97b
验证没问题,谢谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants