Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

运行flash_attn带来的错误 #2

Closed
lonngxiang opened this issue Sep 11, 2024 · 12 comments
Closed

运行flash_attn带来的错误 #2

lonngxiang opened this issue Sep 11, 2024 · 12 comments

Comments

@lonngxiang
Copy link

out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.fwd(
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [298,0,0], thread: [64,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [298,0,0], thread: [65,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.

@lonngxiang
Copy link
Author

flash-attn 2.6.3

@lonngxiang
Copy link
Author

image

@zhangfaen
Copy link
Owner

I tried flash-attention 2 to train, got similary error. so i didn't mention this repo support flash-attention2.
If you find how to support it, PR is welcome!

@zhangfaen
Copy link
Owner

I debugged what is wrong when enable flash_attention_2 in finetune.py.

Conclusion: fixed, see my latest commit ff383f7

How:

  1. the root cause was from a bug of src/transformers/models/qwen2_vl/modeling_qwen2_vl.py
  2. qwen team fixed that bug, see huggingface/transformers@21fac7a

Solution:

  1. get latest of this repo by "git pull https://github.com/zhangfaen/finetune-Qwen2-VL/"
  2. pip uninstall transformers
  3. pip install -r requirements.txt

@zhangfaen
Copy link
Owner

I closed this issue.
if you still have problem, feel free to re-open it.

@lonngxiang
Copy link
Author

收到,可以训练了,就是显存占用还是不低;好项目,用pytorch原生训练

@Guangming92
Copy link

@lonngxiang 麻烦问下,用的什么显卡?我用的4090,24G显存,迭代一轮就报显存不足~你数据量多大?可否交流一下?

@lonngxiang
Copy link
Author

重新安装新transformers后,可以训练,也是4090,就是加载很慢

@Guangming92
Copy link

重新安装新transformers后,可以训练,也是4090,就是加载很慢
我按照上面指导,重新按照requirements.txt安装的库,之前确实也报跟你之前一样的问题,重新安装了,不报找不到索引的问题,但是提示显存不足,程序退出,训练数据都是按照demo来的,并且在加载模型的时候添加参数torch_dtype=torch.bfloat16, attn_implementation='flash_attention_2'
image
请教一下,你执行的时候调整了哪些内容?基础模型是Qwen2-VL-2B-Instruct吗?

@lonngxiang
Copy link
Author

是,就把batch调成1了,
2eda960972ff493d692abec4d50faa06_c883dc392317403c8e35d5502fd1f0b3

@476258834lzx
Copy link

image

你好 ,请问 你是怎么解决的

@476258834lzx
Copy link

重新安装新transformers后,可以训练,也是4090,就是加载很慢
我按照上面指导,重新按照requirements.txt安装的库,之前确实也报跟你之前一样的问题,重新安装了,不报找不到索引的问题,但是提示显存不足,程序退出,训练数据都是按照demo来的,并且在加载模型的时候添加参数torch_dtype=torch.bfloat16, attn_implementation='flash_attention_2'
image
请教一下,你执行的时候调整了哪些内容?基础模型是Qwen2-VL-2B-Instruct吗?

你好,请问,你跑起来了吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants