Make qwen2 `attention_qkv_bias` optional #32893

wavy-jung · 2024-08-20T06:10:51Z

What does this PR do?

Qwen2 and Qwen2-MoE model is forced to add bias to the query, key and value linear projections.
However, following the trend with other recent models (e.g. llama), I refactored these attention_qkv_bias to be optional so that we can configure it in config file.

Fixes #32892

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@ArthurZucker @stevhliu

- compatibility w/ llama model

wavy-jung · 2024-08-20T06:20:13Z

RuntimeError: Failed to import transformers.models.audio_spectrogram_transformer.feature_extraction_audio_spectrogram_transformer because of the following error (look up to see its traceback):

I have no idea why the check_repo test failed with the error messages above. Can you please give me some guides?
to: @ArthurZucker @stevhliu

amyeroberts · 2024-08-20T09:34:30Z

cc @ArthurZucker

ArthurZucker

Hey! We usually do this when there is actually a model that is published and has this config option set / unset.
attention_bias for llama for example

wavy-jung · 2024-08-21T04:35:11Z

@ArthurZucker Thank you for your response!
What I intended is to allow the option to be set/unset in qwen2 as well. In fact, the LLaMA model has never used a bias in linear projection across all its series, and in the official repository, they are all set to bias=False. However, in the Hugging Face implementation, this can be set/unset through the config. Therefore, I want to provide the same functionality here as well.

wavy-jung · 2024-08-23T06:56:04Z

@ArthurZucker Hey, do you have any further comments for this?

ArthurZucker · 2024-08-27T12:08:40Z

Hey! The reason we added that param for llama is for the release of #26302: InternLM was officially using this flag, meaning there is a model related to this change!
Unless a model is publish we usually avoid just enabling something that does not have a model associated! 🤗

Make qwen2 attention_qkv_bias optional

c075569

- compatibility w/ llama model

ArthurZucker reviewed Aug 20, 2024

View reviewed changes

stevhliu mentioned this pull request Aug 20, 2024

added doctring to SchedulerType class #32898

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make qwen2 `attention_qkv_bias` optional #32893

Make qwen2 `attention_qkv_bias` optional #32893

wavy-jung commented Aug 20, 2024

wavy-jung commented Aug 20, 2024

amyeroberts commented Aug 20, 2024

ArthurZucker left a comment

wavy-jung commented Aug 21, 2024

wavy-jung commented Aug 23, 2024

ArthurZucker commented Aug 27, 2024

Make qwen2 attention_qkv_bias optional #32893

Are you sure you want to change the base?

Make qwen2 attention_qkv_bias optional #32893

Conversation

wavy-jung commented Aug 20, 2024

What does this PR do?

Before submitting

Who can review?

wavy-jung commented Aug 20, 2024

amyeroberts commented Aug 20, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

wavy-jung commented Aug 21, 2024

wavy-jung commented Aug 23, 2024

ArthurZucker commented Aug 27, 2024

Make qwen2 `attention_qkv_bias` optional #32893

Make qwen2 `attention_qkv_bias` optional #32893