Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

是否支持启动的时候,指定use_flash_attion_2 ?? #25

Closed
awzhgw opened this issue Feb 5, 2024 · 2 comments
Closed

是否支持启动的时候,指定use_flash_attion_2 ?? #25

awzhgw opened this issue Feb 5, 2024 · 2 comments

Comments

@awzhgw
Copy link

awzhgw commented Feb 5, 2024

是否支持启动的时候,指定use_flash_attion_2 ??

@LinB203
Copy link
Member

LinB203 commented Feb 5, 2024

支持。参考以下代码。除了Qwen以外都可以这样,因为Qwen会自动开启。
[En] It supports, just modify the code as follows except the Qwen-based, because it will auto-enable the flash-attn.

model = LlavaPhiForCausalLM.from_pretrained(
                    model_args.model_name_or_path,
                    cache_dir=training_args.cache_dir,
                    attn_implementation="flash_attention_2",  # add this line
                    **bnb_model_from_pretrained_args
                )

@LinB203
Copy link
Member

LinB203 commented Feb 7, 2024

Hi, we have tested the flash attention2, but we found the performance degradation. We found the same question....
Therefore, we do not recommend you guys to enable the flash attention2.
huggingface/transformers#28488

@LinB203 LinB203 reopened this Feb 7, 2024
@awzhgw awzhgw closed this as completed Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants