Skip M4T `test_retain_grad_hidden_states_attentions` #28060

ylacombe · 2023-12-15T09:46:02Z

What does this PR do?

After investigating the reasons for the test_retain_grad_hidden_states_attentions flaky failure, I realized the speech encoder attentions can be None with a non-zero probability when training=True. Skipping the test is the fastest fix.

Fixes #28036

cc @gante @amyeroberts @ydshieh

ArthurZucker

Alright, thanks 🤗

HuggingFaceDocBuilderDev · 2023-12-15T10:17:12Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts · 2023-12-15T11:59:33Z

Thanks for fixing!

If training is allowed to happen on the model but it can fail e.g. with attentions being None, could you open an issue to track this? Training should either be prevented with an exception or made possible (probably 1 then the other)

ylacombe · 2023-12-15T13:39:10Z

Hey @amyeroberts, in theory, training is supported for the tasks that translate inputs (text or audio) into texts, since it's a classic LLM with classic objective.
To improve training, the model randomly skip layers in the speech encoder block (thus having None as attention weights), but it doesn't break training when it happens.

* skip test from SpeechInput * refine description of skip

skip test from SpeechInput

1f1b65d

ArthurZucker approved these changes Dec 15, 2023

View reviewed changes

refine description of skip

89410f0

ylacombe merged commit deb72cb into huggingface:main Dec 15, 2023
18 checks passed

amyeroberts mentioned this pull request Dec 15, 2023

SeamlessM4T: test_retain_grad_hidden_states_attentions is flaky #28036

Closed

iantbutler01 pushed a commit to BismuthCloud/transformers that referenced this pull request Dec 16, 2023

Skip M4T test_retain_grad_hidden_states_attentions (huggingface#28060)

972e9a6

* skip test from SpeechInput * refine description of skip

ArthurZucker mentioned this pull request Dec 21, 2023

disable test_retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest #28169

Merged

5 tasks

staghado pushed a commit to staghado/transformers that referenced this pull request Jan 15, 2024

Skip M4T test_retain_grad_hidden_states_attentions (huggingface#28060)

47faaa6

* skip test from SpeechInput * refine description of skip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip M4T `test_retain_grad_hidden_states_attentions` #28060

Skip M4T `test_retain_grad_hidden_states_attentions` #28060

ylacombe commented Dec 15, 2023

ArthurZucker left a comment

HuggingFaceDocBuilderDev commented Dec 15, 2023

amyeroberts commented Dec 15, 2023

ylacombe commented Dec 15, 2023 •

edited

Loading

Skip M4T test_retain_grad_hidden_states_attentions #28060

Skip M4T test_retain_grad_hidden_states_attentions #28060

Conversation

ylacombe commented Dec 15, 2023

What does this PR do?

ArthurZucker left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Dec 15, 2023

amyeroberts commented Dec 15, 2023

ylacombe commented Dec 15, 2023 • edited Loading

Skip M4T `test_retain_grad_hidden_states_attentions` #28060

Skip M4T `test_retain_grad_hidden_states_attentions` #28060

ylacombe commented Dec 15, 2023 •

edited

Loading