Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update llama_flash_attn_monkey_patch.py for flash attention 2 #2059

Merged
merged 1 commit into from
Jul 24, 2023

Conversation

yilin-bao
Copy link
Contributor

@yilin-bao yilin-bao commented Jul 24, 2023

Upgrading from FlashAttention (1.x) to FlashAttention-2

These functions have been renamed:

flash_attn_unpadded_qkvpacked_func -> flash_attn_varlen_qkvpacked_func

You can check how they made the update: https://github.com/Dao-AILab/flash-attention

Why are these changes needed?

The function flash_attn_unpadded_qkvpacked_func is not included anymore in
flash-attention, so causing issue with fine-tuning. This will fix the problem.

Related issue number (if applicable)

Checks

Upgrading from FlashAttention (1.x) to FlashAttention-2

These functions have been renamed:

flash_attn_unpadded_qkvpacked_func -> flash_attn_varlen_qkvpacked_func
@yilin-bao
Copy link
Contributor Author

Also, could point out that fastchat is depending on flash-attention edition v1.0.9 or lower. That could also fix the problem.

@yilin-bao yilin-bao closed this Jul 24, 2023
@yilin-bao yilin-bao reopened this Jul 24, 2023
@merrymercy merrymercy changed the title Update llama_flash_attn_monkey_patch.py Update llama_flash_attn_monkey_patch.py for flash attention 2 Jul 24, 2023
@merrymercy merrymercy merged commit a08c1d6 into lm-sys:main Jul 24, 2023
@merrymercy merrymercy mentioned this pull request Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants