Fix flash-attn + qlora not working with llama models #336

tmm1 · 2023-08-03T16:28:01Z

fixes this error:

  File "/mnt/ml/axolotl/src/axolotl/flash_attn.py", line 98, in forward                                                    
    output_unpad = flash_attn_varlen_qkvpacked_func(                                                                       
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                       
  File "/mnt/ml/flash-attention/flash_attn/flash_attn_interface.py", line 406, in flash_attn_varlen_qkvpacked_func         
    return FlashAttnVarlenQKVPackedFunc.apply(                                                                             
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                             
  File "/home/tmm1/micromamba/envs/test/lib/python3.11/site-packages/torch/autograd/function.py", line 539, in apply       
    return super().apply(*args, **kwargs)  # type: ignore[misc]                                                            
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                  
  File "/mnt/ml/flash-attention/flash_attn/flash_attn_interface.py", line 123, in forward                                  
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_varlen_forward(                                
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^                                
  File "/mnt/ml/flash-attention/flash_attn/flash_attn_interface.py", line 52, in _flash_attn_varlen_forward                
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(                                
                                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^                                
RuntimeError: FlashAttention only support fp16 and bf16 data type

…rch_dtype

src/axolotl/utils/models.py

Fix flash-attn + qlora not working with llama models

tmm1 added 2 commits August 2, 2023 20:15

ensure flash-attn fixes happen in both adapter/lora modes, and use to…

248bf90

…rch_dtype

move flash-attn monkey patch alongside the others

312a9fa

winglian reviewed Aug 3, 2023

View reviewed changes

src/axolotl/utils/models.py Outdated Show resolved Hide resolved

scope flash-attn+qlora fix correctly, scope to llama, add comment

78b9efb

winglian approved these changes Aug 3, 2023

View reviewed changes

tmm1 mentioned this pull request Aug 3, 2023

Flash attention philschmid/deep-learning-pytorch-huggingface#22

Merged

fix typo

2eda9e0

tmm1 merged commit 0d2e34f into axolotl-ai-cloud:main Aug 3, 2023
3 checks passed

mkeoliya pushed a commit to mkeoliya/axolotl that referenced this pull request Dec 15, 2023

Merge pull request axolotl-ai-cloud#336 from tmm1/flash-attn

c4a34c7

Fix flash-attn + qlora not working with llama models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flash-attn + qlora not working with llama models #336

Fix flash-attn + qlora not working with llama models #336

tmm1 commented Aug 3, 2023

Fix flash-attn + qlora not working with llama models #336

Fix flash-attn + qlora not working with llama models #336

Conversation

tmm1 commented Aug 3, 2023