Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip M4T test_retain_grad_hidden_states_attentions #28060

Merged
merged 2 commits into from
Dec 15, 2023

Conversation

ylacombe
Copy link
Contributor

What does this PR do?

After investigating the reasons for the test_retain_grad_hidden_states_attentions flaky failure, I realized the speech encoder attentions can be None with a non-zero probability when training=True. Skipping the test is the fastest fix.

Fixes #28036

cc @gante @amyeroberts @ydshieh

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, thanks 🤗

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@amyeroberts
Copy link
Collaborator

Thanks for fixing!

If training is allowed to happen on the model but it can fail e.g. with attentions being None, could you open an issue to track this? Training should either be prevented with an exception or made possible (probably 1 then the other)

@ylacombe
Copy link
Contributor Author

ylacombe commented Dec 15, 2023

Hey @amyeroberts, in theory, training is supported for the tasks that translate inputs (text or audio) into texts, since it's a classic LLM with classic objective.
To improve training, the model randomly skip layers in the speech encoder block (thus having None as attention weights), but it doesn't break training when it happens.

@ylacombe ylacombe merged commit deb72cb into huggingface:main Dec 15, 2023
18 checks passed
iantbutler01 pushed a commit to BismuthCloud/transformers that referenced this pull request Dec 16, 2023
* skip test from SpeechInput

* refine description of skip
staghado pushed a commit to staghado/transformers that referenced this pull request Jan 15, 2024
* skip test from SpeechInput

* refine description of skip
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SeamlessM4T: test_retain_grad_hidden_states_attentions is flaky
4 participants