运行flash_attn带来的错误 #2

lonngxiang · 2024-09-11T07:01:05Z

out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.fwd(
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [298,0,0], thread: [64,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [298,0,0], thread: [65,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.

The text was updated successfully, but these errors were encountered:

lonngxiang · 2024-09-11T07:01:27Z

flash-attn 2.6.3

lonngxiang · 2024-09-11T07:02:21Z

zhangfaen · 2024-09-11T14:25:53Z

I tried flash-attention 2 to train, got similary error. so i didn't mention this repo support flash-attention2.
If you find how to support it, PR is welcome!

zhangfaen · 2024-09-12T09:51:45Z

I debugged what is wrong when enable flash_attention_2 in finetune.py.

Conclusion: fixed, see my latest commit ff383f7

How:

the root cause was from a bug of src/transformers/models/qwen2_vl/modeling_qwen2_vl.py
qwen team fixed that bug, see huggingface/transformers@21fac7a

Solution:

get latest of this repo by "git pull https://github.com/zhangfaen/finetune-Qwen2-VL/"
pip uninstall transformers
pip install -r requirements.txt

zhangfaen · 2024-09-12T09:52:40Z

I closed this issue.
if you still have problem, feel free to re-open it.

lonngxiang · 2024-09-12T10:55:34Z

收到，可以训练了，就是显存占用还是不低；好项目，用pytorch原生训练

Guangming92 · 2024-09-14T03:39:55Z

@lonngxiang 麻烦问下，用的什么显卡？我用的4090，24G显存，迭代一轮就报显存不足~你数据量多大？可否交流一下？

lonngxiang · 2024-09-14T03:45:42Z

重新安装新transformers后，可以训练，也是4090，就是加载很慢

Guangming92 · 2024-09-14T06:15:57Z

重新安装新transformers后，可以训练，也是4090，就是加载很慢
我按照上面指导，重新按照requirements.txt安装的库，之前确实也报跟你之前一样的问题，重新安装了，不报找不到索引的问题，但是提示显存不足，程序退出，训练数据都是按照demo来的，并且在加载模型的时候添加参数torch_dtype=torch.bfloat16, attn_implementation='flash_attention_2'

请教一下，你执行的时候调整了哪些内容？基础模型是Qwen2-VL-2B-Instruct吗？

lonngxiang · 2024-09-14T13:25:54Z

是，就把batch调成1了，

476258834lzx · 2024-09-18T10:21:32Z

你好，请问你是怎么解决的

476258834lzx · 2024-09-18T10:22:19Z

重新安装新transformers后，可以训练，也是4090，就是加载很慢
我按照上面指导，重新按照requirements.txt安装的库，之前确实也报跟你之前一样的问题，重新安装了，不报找不到索引的问题，但是提示显存不足，程序退出，训练数据都是按照demo来的，并且在加载模型的时候添加参数torch_dtype=torch.bfloat16, attn_implementation='flash_attention_2'

请教一下，你执行的时候调整了哪些内容？基础模型是Qwen2-VL-2B-Instruct吗？

你好，请问，你跑起来了吗

zhangfaen closed this as completed Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

运行flash_attn带来的错误 #2

运行flash_attn带来的错误 #2

lonngxiang commented Sep 11, 2024

lonngxiang commented Sep 11, 2024

lonngxiang commented Sep 11, 2024

zhangfaen commented Sep 11, 2024

zhangfaen commented Sep 12, 2024

zhangfaen commented Sep 12, 2024

lonngxiang commented Sep 12, 2024

Guangming92 commented Sep 14, 2024

lonngxiang commented Sep 14, 2024

Guangming92 commented Sep 14, 2024

lonngxiang commented Sep 14, 2024

476258834lzx commented Sep 18, 2024

476258834lzx commented Sep 18, 2024

运行flash_attn带来的错误 #2

运行flash_attn带来的错误 #2

Comments

lonngxiang commented Sep 11, 2024

lonngxiang commented Sep 11, 2024

lonngxiang commented Sep 11, 2024

zhangfaen commented Sep 11, 2024

zhangfaen commented Sep 12, 2024

zhangfaen commented Sep 12, 2024

lonngxiang commented Sep 12, 2024

Guangming92 commented Sep 14, 2024

lonngxiang commented Sep 14, 2024

Guangming92 commented Sep 14, 2024

lonngxiang commented Sep 14, 2024

476258834lzx commented Sep 18, 2024

476258834lzx commented Sep 18, 2024