[Wav2Vec2 - MMS] Correct directly loading adapters weights #24335

patrickvonplaten · 2023-06-17T20:45:36Z

What does this PR do?

This PR corrects incorrect behavior when loading MMS with non-default adapter weights via from_pretrained(...). The issue is explained well here.

In a nutshell, we cannot load specific weights in the init because these loaded weights are later overwritten again in from_pretrained. To solve this I propose to add a new generic

load_adaptive_weights()

call to from_pretrained that can be overridden by models that inherit from PretrainedModel. This both solves the issue #24223
and is also cleaner IMO since weights shouldn't be loaded when calling the __init__ method of a model anyways really. It was weird before that:

model = Wav2Vec2ForCTC(config, target_lang="fra")

would try to load weights into the model.

cc @sgugger @sanchit-gandhi @amyeroberts wdyt about the design? Happy to add some more tests if ok for you

HuggingFaceDocBuilderDev · 2023-06-17T21:02:17Z

The documentation is not available anymore as the PR was closed or merged.

amyeroberts · 2023-06-19T17:14:42Z

src/transformers/modeling_utils.py

+        Load adaptive weights after state dict has been loaded. If required this method should be overridden by derived
+        class.
+        """
+        pass


I'd raise an exception here if it's called, otherwise it fails silently

Suggested change

pass

raise NotImplementedError

Yeah not 100% sure about the design here, but it seems much more in line with:

transformers/src/transformers/modeling_utils.py

Line 1257 in c5454eb

def tie_weights(self):

and

transformers/src/transformers/modeling_utils.py

Line 1242 in c5454eb

def _init_weights(self, module):

Think something like if hasattr(self, "load_adaptive_weights") is also not great.

Also it is a bit questionable whether load_adaptive_weights is general enough to warrant to be in modeling_utils.py, but there is no other way really for the functionality from_pretrained(..., target_lang="...")

@LysandreJik @sgugger wdyt?

For now I would override the _tie_encoder_decoder_weights method in Wav2Vec2 only and not add those changes in modeling_utils. If things change and we get lots of models with adapaters, we can revisit the decision and do something like this, but I'd wait for it to be necessary.

That's pretty hacky, but yeah ok for me. Will add a big comment that we do this to not introduce a new API

amyeroberts · 2023-06-19T17:53:00Z

Sorry, I accidentally submitted the review without a saved comment. I realised in the from_pretrained call why you were using pass. I still think raising an exception would be good, as otherwise we can get silent behaviour. Would it be possible to reliably check if load_adaptive_weights should be implemented for a model?

p.s. ignoring the wandb diffs, as they're just from being out-of-date from main

examples/research_projects/jax-projects/big_bird/bigbird_flax.py

sgugger · 2023-06-20T12:26:06Z

src/transformers/modeling_utils.py

+        Load adaptive weights after state dict has been loaded. If required this method should be overridden by derived
+        class.
+        """
+        pass


For now I would override the _tie_encoder_decoder_weights method in Wav2Vec2 only and not add those changes in modeling_utils. If things change and we get lots of models with adapaters, we can revisit the decision and do something like this, but I'd wait for it to be necessary.

src/transformers/models/wav2vec2/modeling_wav2vec2.py

sgugger · 2023-06-20T13:24:44Z

src/transformers/models/wav2vec2/modeling_wav2vec2.py

+        While slighly hacky, Wav2Vec2 never has to tie input and output embeddings, so that it is ok to repurpose this 
+        function here.
+
+        This method is **not** supposed to be called by the user and is prone to be changed in the future.


Mmmm tie_weights is a public API of PreTrainedModel and we do recommend to users to call it in Accelerate (if they load the model manually instead of using from_pretrained).

Overwriting _tie_encoder_decoder_weights doesn't work really though as Wav2Vec2 is not an encoder decoder model

I don't want to force config.is_encoder_decoder=True for MMS models - that is really not correct

That is not what I'm saying. I'm just commenting on this line that will appear in the docstring.

As per comment, I think it's fine though because Wav2Vec2 will never tie input & output weights weights. So even if the user calls it, it's ok

Perhaps moving the comment to the docstring would help clarify this?

Note that with the doc setup as they are this function won't be present in the documentation so we are actually debating for nothing. 😅 (only forward is set in the methods to document)

sanchit-gandhi

Thanks for the fixes @patrickvonplaten. The test is convincing for me and don't mind about the design with overriding .tie_weights since it won't be used anyway for W2V2 - will let you finalise this with Sylvain! Overall, much prefer this design to loading weights in the __init__

amyeroberts

LGTM - thanks for fixing and iterating!

amyeroberts · 2023-06-20T16:49:07Z

tests/models/wav2vec2/test_modeling_wav2vec2.py

+
+        logits_2 = get_logits(model_2, input_features)
+
+        self.assertTrue(torch.allclose(logits, logits_2, atol=1e-3))


tbh, I'm surprise the tolerance is so high given we're loading the same weights into the same model 👀

Correct direct lang loading

7c00aed

patrickvonplaten mentioned this pull request Jun 17, 2023

MMS: target_lang=fra in pipeline() leads to "Size mismatch for lm_head.weight/bias when loading state_dict for Wav2Vec2ForCTC" #24223

Closed

4 tasks

correct more

32a7678

amyeroberts reviewed Jun 19, 2023

View reviewed changes

sgugger reviewed Jun 20, 2023

View reviewed changes

patrickvonplaten added 3 commits June 20, 2023 14:57

revert black

89fbb1a

Use tie weights instead=

4b480ff

add tests

7aadff8

patrickvonplaten requested review from amyeroberts and sgugger June 20, 2023 13:24

sgugger reviewed Jun 20, 2023

View reviewed changes

add tests

89e8baf

sanchit-gandhi approved these changes Jun 20, 2023

View reviewed changes

amyeroberts approved these changes Jun 20, 2023

View reviewed changes

make style

05bfa13

patrickvonplaten merged commit b0513b0 into main Jun 20, 2023

patrickvonplaten deleted the correct_direct_lang_loading branch June 20, 2023 17:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Wav2Vec2 - MMS] Correct directly loading adapters weights #24335

[Wav2Vec2 - MMS] Correct directly loading adapters weights #24335

patrickvonplaten commented Jun 17, 2023

HuggingFaceDocBuilderDev commented Jun 17, 2023 •

edited

Loading

amyeroberts Jun 19, 2023

patrickvonplaten Jun 20, 2023

sgugger Jun 20, 2023

patrickvonplaten Jun 20, 2023

amyeroberts commented Jun 19, 2023

sgugger Jun 20, 2023

sgugger Jun 20, 2023

patrickvonplaten Jun 20, 2023

patrickvonplaten Jun 20, 2023 •

edited

Loading

sgugger Jun 20, 2023

patrickvonplaten Jun 20, 2023

sanchit-gandhi Jun 20, 2023

sgugger Jun 20, 2023 •

edited

Loading

sanchit-gandhi left a comment

amyeroberts left a comment

amyeroberts Jun 20, 2023


		logits_2 = get_logits(model_2, input_features)

		self.assertTrue(torch.allclose(logits, logits_2, atol=1e-3))

[Wav2Vec2 - MMS] Correct directly loading adapters weights #24335

[Wav2Vec2 - MMS] Correct directly loading adapters weights #24335

Conversation

patrickvonplaten commented Jun 17, 2023

What does this PR do?

HuggingFaceDocBuilderDev commented Jun 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts commented Jun 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickvonplaten Jun 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgugger Jun 20, 2023 • edited Loading

Choose a reason for hiding this comment

sanchit-gandhi left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 17, 2023 •

edited

Loading

patrickvonplaten Jun 20, 2023 •

edited

Loading

sgugger Jun 20, 2023 •

edited

Loading