Support BatchNorm in Hubert pos_conv_emb as in fairseq #34389

gallilmaimon · 2024-10-24T17:02:06Z

What does this PR do?

This issue adds support for BatchNorm instead of weight norm in the HubertModel as in facebookresearch/fairseq@4db2649

The conversion file was also adapted to support the conversion from fairseq to HF, and was used to convert the widely used Hubert-base-25hz introduced in https://arxiv.org/abs/2305.13009

Fixes #34229

We already uploaded the converted weights to the hub at - https://huggingface.co/slprl/mhubert-base-25hz , which allows to assert the conversion worked correctly (compared to the original publication in textlesslib as follows):

# Asserting that results are identical to textless original
from transformers import HubertModel
from textless.data.speech_encoder import SpeechEncoder
import torchaudio

model = SpeechEncoder.by_name(dense_model_name='mhubert-base-25hz', quantizer_model_name='kmeans', vocab_size=500, deduplicate=False, need_f0=False)
hf_model = HubertModel.from_pretrained('slprl/mhubert-base-25hz')

wav = torchaudio.load(<WAV_PATH>)[0]

torch.allclose(model(wav)['dense'], hf_model(wav, output_hidden_states=True).hidden_states[11])

@ylacombe - would love your review and specifically there were several open questions I was wondering about:

This means that HubertPositionalConvEmbedding is no longer a copy of transformers.models.wav2vec2.modeling_wav2vec2 - I addressed this but removing the comment, would you prefer I also change wav2vec?
The conversion script convert_hubert_original_pytorch_checkpoint_to_pytorch.py didn't work (before the change) for the regular hubert-base-ls960h model because of layernorm naming changes, as discussed in (False?) warning about weight_g/weight_v missing on WeightNorm on PyTorch #26796. I didn't fix this because this felt out of scope.
I would love some guidance or help with deepspeed because I wasn't sure if any changes were needed to support this.
I also got some error when running make fixup which has to do with a file I haven't changed - src/transformers/models/glm/modeling_glm.py and I didn't manage to understand why. This also happened when running make fixup on a clean branch with no changes at all so would appreciate any help.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ylacombe
@eustlb

ylacombe

Hi @gallilmaimon, thanks for quickly opening this PR!

The integration test you did looks good. Let's make sure to add this in the integration tests of test_modeling_hubert.py !

Once it's done, you can also push an empty commit to launch the slow tests CI run: git commit --allow-empty -m "[run-slow] hubert" !

To address your questions:

I think it's okay to remove the Copied from statement here
This is a correct observation. Sorry that I've missed your comment about it! Would you like to open another quick PR to correct this ?
I think you did the deepspeed integration correctly : it's only applied when using weight_norm
you might want to rebase your branch on the main transformers. If it doesn't work, you can share some logs here, so that I can help you!

Let me know if you have further questions !

ylacombe · 2024-10-25T11:52:32Z

src/transformers/models/hubert/modeling_hubert.py

+        if config.conv_pos_batch_norm:
+            batch_norm = nn.BatchNorm1d(config.hidden_size)
+            self.conv = nn.Sequential(batch_norm, self.conv)
+        else:
+            weight_norm = nn.utils.weight_norm
+            if hasattr(nn.utils.parametrizations, "weight_norm"):
+                weight_norm = nn.utils.parametrizations.weight_norm

-        if is_deepspeed_zero3_enabled():
-            import deepspeed
+            if is_deepspeed_zero3_enabled():
+                import deepspeed

-            with deepspeed.zero.GatheredParameters(self.conv.weight, modifier_rank=0):
-                self.conv = weight_norm(self.conv, name="weight", dim=2)
-            if hasattr(self.conv, "parametrizations"):
-                weight_g = self.conv.parametrizations.weight.original0
-                weight_v = self.conv.parametrizations.weight.original1
+                with deepspeed.zero.GatheredParameters(self.conv.weight, modifier_rank=0):
+                    self.conv = weight_norm(self.conv, name="weight", dim=2)
+                if hasattr(self.conv, "parametrizations"):
+                    weight_g = self.conv.parametrizations.weight.original0
+                    weight_v = self.conv.parametrizations.weight.original1
+                else:
+                    weight_g = self.conv.weight_g
+                    weight_v = self.conv.weight_v
+                deepspeed.zero.register_external_parameter(self, weight_v)
+                deepspeed.zero.register_external_parameter(self, weight_g)
            else:
-                weight_g = self.conv.weight_g
-                weight_v = self.conv.weight_v
-            deepspeed.zero.register_external_parameter(self, weight_v)
-            deepspeed.zero.register_external_parameter(self, weight_g)
-        else:
-            self.conv = weight_norm(self.conv, name="weight", dim=2)
+                self.conv = weight_norm(self.conv, name="weight", dim=2)


I think we'd rather add a self.batch_norm = None if not config.conv_pos_batch_norm else nn.BatchNorm1d(config.hidden_size) that we'd use in the forward pass, rather than using nn.Sequential here

I felt that the current method was more similar to the weight norm approach (and also similar to fairseq), but can change to your suggestion and update the conversion script as well

In transformers, we rather make everything explicit!

ylacombe · 2024-10-25T11:53:01Z

src/transformers/models/hubert/configuration_hubert.py

@@ -94,6 +94,8 @@ class HubertConfig(PretrainedConfig):
            embeddings layer.
        num_conv_pos_embedding_groups (`int`, *optional*, defaults to 16):
            Number of groups of 1D convolutional positional embeddings layer.
+        conv_pos_batch_norm (`bool`, *optional*, defaults to `False`):


Why do we precise (for bf16 models) out of curiosity ?

To be honest I just copied this from the fairseq definition https://github.com/facebookresearch/fairseq/blob/ecbf110e1eb43861214b05fa001eff584954f65a/fairseq/models/hubert/hubert.py#L197

I can remove this if you prefer.

Let's remove it then

HuggingFaceDocBuilderDev · 2024-10-25T12:28:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gallilmaimon · 2024-10-25T15:54:54Z

The integration test you did looks good. Let's make sure to add this in the integration tests of test_modeling_hubert.py !

Okay, do you prefer that I do it with the textlesslib dependency in the test itself? or save the output and then just compare the output of HubertModel to the saved results from textlesslib?

This is a correct observation. Sorry that I've missed your comment about it! Would you like to open another quick PR to correct this ?

Okay, I will open a new one about this separately.

you might want to rebase your branch on the main transformers. If it doesn't work, you can share some logs here, so that I can help you!

I did do the rebase I think, but will try again and let you know

ylacombe · 2024-10-25T16:00:54Z

Okay, do you prefer that I do it with the textlesslib dependency in the test itself? or save the output and then just compare the output of HubertModel to the saved results from textlesslib?

You can take a look at the test modeling file to get inspiration, we usually compute a few stats from the expected outputs, as well as a small sequence extracted from the expected outputs

* Correct the new defaults * CIs * add check * Update utils.py * Update utils.py * Add the max_length in generate test checking shape without passing length * style * CIs * fix fx CI issue

…gface#34383) * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix duplicated * fix import

* add support for non nested images and add tests * add tests error scenario * fix style * added single and no image to error tests

* fix onnx non-expotable inplace op * mistral, qwen2, qwen2_vl, starcoder2 * fixup copies

* fix right pad llavas * device mismatch

* no filter * no filter * no filter --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* better example * Update src/transformers/generation/configuration_utils.py * Update src/transformers/generation/logits_process.py * nits

* Fix bnb training test: compatibility with OPTSdpaAttention

* update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

cache

Add conversion integration test, and make batchnorm explicit variable

fix make fixup styling changes

…pport

Add conversion integration test, and make batchnorm explicit variable

fix make fixup styling changes

gallilmaimon · 2024-10-26T09:14:57Z

@ylacombe Hey, I think I addressed all of your comments. Let me know if anything else is needed :)

gallilmaimon · 2024-10-30T07:10:29Z

@ylacombe Hey again, just a gentle reminder about this as I would be happy to integrate as soon as possible. Thanks again for your time and feedback!

avishaiElmakies · 2024-11-13T14:53:03Z

@ylacombe
sorry to bother about this. But I would really love for this change to be added.

gallilmaimon · 2024-11-25T15:42:42Z

@ylacombe - Hey again, just wondering if you had a chance to go over this so we can integrate this addition. There are several projects I know of which would build on this fix. Thanks again!

ylacombe

Hey @gallilmaimon , really sorry for the late review! Thanks for integrating my comments, it looks good to me now!

Also, thanks for adding the integration tests!

ylacombe · 2024-11-26T15:30:48Z

Let's push again the empty commit: git commit --allow-empty -m "[run-slow] hubert" ! I don't think it has run yet

…b.com/gallilmaimon/transformers into add_hubert_conv_emb_batchnorm_support

ylacombe · 2024-11-26T17:33:58Z

cc @ArthurZucker , could you review when you have time?

@gallilmaimon, thanks again for the work on this PR! Excited to try the new checkpoint on downstream tasks. Have you been able to run some benchmarks against other Hubert checkpoints?

gallilmaimon · 2024-11-26T17:44:37Z

@gallilmaimon, thanks again for the work on this PR! Excited to try the new checkpoint on downstream tasks. Have you been able to run some benchmarks against other Hubert checkpoints?

@ylacombe - My main usage is for discretising the representations and using them to train SpeechLMs and the results there seem good as expected (similar to the TWIST paper and notably better than the 50 hz), I will open-source this all hopefully very soon once it is ready!

But might also be interesting to try it for other downstream usages:)

gallilmaimon · 2024-12-04T09:18:47Z

Hey @ArthurZucker any chance you had an opportunity to review this PR? Would really love to integrate this :) Thanks!

ylacombe · 2024-12-05T17:31:17Z

Requesting @Rocketknight1's because @ArthurZucker has limited responsibility for a few days

ArthurZucker

It goes a little bit against our philosophy, as usually this would need a new model (because we introduce a new code path)!
We can stray a little bit here, or we can go about this using modular but it might be an overkill!

ArthurZucker · 2024-12-10T07:39:02Z

tests/models/hubert/test_modeling_hubert.py

@@ -943,3 +943,40 @@ def test_inference_distilhubert(self):
        self.assertTrue(torch.allclose(outputs[:, :4, :4], expected_outputs_first, atol=5e-3))
        self.assertTrue(torch.allclose(outputs[:, -4:, -4:], expected_outputs_last, atol=5e-3))
        self.assertTrue(abs(outputs.sum() - expected_output_sum) < 0.1)
+
+    def test_inference_hubert_25hz(self):
+        model = HubertModel.from_pretrained("slprl/mhubert-base-25hz").to(torch_device)


would be nice to open a PR to the original repo and use pr branch revision in the mean time!

ArthurZucker · 2024-12-10T07:40:10Z

My only request is to use official checkpoint path for the test, otherwise good job and sorry for being late on this review!

gallilmaimon · 2024-12-10T08:02:09Z

My only request is to use official checkpoint path for the test, otherwise good job and sorry for being late on this review!

Hey @ArthurZucker, thanks for the review! I am not sure I understand what you mean by "official checkpoint" as this was only released as part of Fairseq and not HF, thus we preformed the conversion (as part of our academic lab - SLPRL) for community use. We validated that the results are identical as shown by the test. We of course give full reference and credit in the model card. I am happy to put the weights anywhere needed and merge the PR!

ylacombe · 2024-12-10T10:59:22Z

Hey @ArthurZucker, the model hasn't been officially released and can only be found by digging deep into the fairseq repositories. Since the model card is quite clean, gives full reference, and is hosted in the organisation of an academic research lab, I believe it should be OK to keep it like this, WDYT?

@gallilmaimon, could you add the license (MIT I think?) to the model card metadata in the meantime? I've opened a PR to do this, if it's indeed MIT-licensed

gallilmaimon · 2024-12-10T11:09:20Z

Hey @ylacombe, I approved your PR as this is in fact MIT licensed and I also have a link to the original license in fairseq GitHub!

ylacombe · 2024-12-10T13:18:08Z

I've asked some of the people on the research paper author lists if it would be possible to transfer the checkpoint to the Meta organization. In the meantime, let's merge, thanks for your excellent work!

Support BatchNorm in Hubert pos_conv_emb as in fairseq

023a033

ylacombe reviewed Oct 25, 2024

View reviewed changes

ylacombe added the run-slow label Oct 25, 2024

Cyrilvallez and others added 23 commits October 26, 2024 11:47

Correct the new defaults (huggingface#34377)

fb32b55

* Correct the new defaults * CIs * add check * Update utils.py * Update utils.py * Add the max_length in generate test checking shape without passing length * style * CIs * fix fx CI issue

[auto. ping] Avoid sending empty info + add more team members (huggin…

c4ab8a5

…gface#34383) * update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix glm (huggingface#34388)

b2a7b11

* Fix duplicated * fix import

Use non nested images and batched text Idefics2/3 (huggingface#34222)

5289130

* add support for non nested images and add tests * add tests error scenario * fix style * added single and no image to error tests

Fix onnx non-expotable inplace aten op (huggingface#34376)

86468ad

* fix onnx non-expotable inplace op * mistral, qwen2, qwen2_vl, starcoder2 * fixup copies

Fix right padding in LLaVA models (huggingface#34305)

cfe1e14

* fix right pad llavas * device mismatch

no filter (huggingface#34391)

337621a

* no filter * no filter * no filter --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

SynthID: better example (huggingface#34372)

3ae703b

* better example * Update src/transformers/generation/configuration_utils.py * Update src/transformers/generation/logits_process.py * nits

Tests: upgrade test_eager_matches_sdpa_generate (huggingface#34386)

a0ccf20

Fix bnb training test failure (huggingface#34414)

3c3e153

* Fix bnb training test: compatibility with OPTSdpaAttention

Avoid check expected exception when it is on CUDA (huggingface#34408)

2dded53

* update * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Fix typos in agents_advanced.md (huggingface#34405)

d8edfcb

[docs] Cache implementations (huggingface#34325)

3398913

cache

[run-slow] hubert

75f0689

Support BatchNorm in Hubert pos_conv_emb as in fairseq

a3042a0

Add conversion integration test, and make batchnorm explicit variable

Support BatchNorm in Hubert pos_conv_emb as in fairseq

a0a2731

fix make fixup styling changes

Merge branch 'huggingface:main' into add_hubert_conv_emb_batchnorm_su…

987d521

…pport

[run-slow] hubert

ce40909

Support BatchNorm in Hubert pos_conv_emb as in fairseq

391ea79

[run-slow] hubert

7bbc7b4

Support BatchNorm in Hubert pos_conv_emb as in fairseq

3e7f77e

Add conversion integration test, and make batchnorm explicit variable

Support BatchNorm in Hubert pos_conv_emb as in fairseq

2ca473f

fix make fixup styling changes

[run-slow] hubert

9f167a2

gallilmaimon requested a review from ylacombe November 3, 2024 13:50

Merge branch 'main' into add_hubert_conv_emb_batchnorm_support

af1d65e

ylacombe approved these changes Nov 26, 2024

View reviewed changes

gallilmaimon added 2 commits November 26, 2024 18:23

Merge branch 'add_hubert_conv_emb_batchnorm_support' of https://githu…

eaed17f

…b.com/gallilmaimon/transformers into add_hubert_conv_emb_batchnorm_support

[run-slow] hubert

61d8ad0

ylacombe requested a review from ArthurZucker November 26, 2024 17:32

ylacombe requested a review from Rocketknight1 December 5, 2024 17:30

ArthurZucker approved these changes Dec 10, 2024

View reviewed changes

ylacombe merged commit 6acb4e4 into huggingface:main Dec 10, 2024
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support BatchNorm in Hubert pos_conv_emb as in fairseq #34389

Support BatchNorm in Hubert pos_conv_emb as in fairseq #34389

gallilmaimon commented Oct 24, 2024 •

edited

Loading

ylacombe left a comment

ylacombe Oct 25, 2024

gallilmaimon Oct 25, 2024

ylacombe Oct 25, 2024

ylacombe Oct 25, 2024

gallilmaimon Oct 25, 2024

ylacombe Oct 25, 2024

HuggingFaceDocBuilderDev commented Oct 25, 2024

gallilmaimon commented Oct 25, 2024

ylacombe commented Oct 25, 2024

gallilmaimon commented Oct 26, 2024

gallilmaimon commented Oct 30, 2024

avishaiElmakies commented Nov 13, 2024

gallilmaimon commented Nov 25, 2024

ylacombe left a comment •

edited

Loading

ylacombe commented Nov 26, 2024

ylacombe commented Nov 26, 2024

gallilmaimon commented Nov 26, 2024

gallilmaimon commented Dec 4, 2024

ylacombe commented Dec 5, 2024

ArthurZucker left a comment

ArthurZucker Dec 10, 2024

ArthurZucker commented Dec 10, 2024

gallilmaimon commented Dec 10, 2024

ylacombe commented Dec 10, 2024 •

edited

Loading

gallilmaimon commented Dec 10, 2024

ylacombe commented Dec 10, 2024

Support BatchNorm in Hubert pos_conv_emb as in fairseq #34389

Support BatchNorm in Hubert pos_conv_emb as in fairseq #34389

Conversation

gallilmaimon commented Oct 24, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

ylacombe left a comment

Choose a reason for hiding this comment

ylacombe Oct 25, 2024

Choose a reason for hiding this comment

gallilmaimon Oct 25, 2024

Choose a reason for hiding this comment

ylacombe Oct 25, 2024

Choose a reason for hiding this comment

ylacombe Oct 25, 2024

Choose a reason for hiding this comment

gallilmaimon Oct 25, 2024

Choose a reason for hiding this comment

ylacombe Oct 25, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Oct 25, 2024

gallilmaimon commented Oct 25, 2024

ylacombe commented Oct 25, 2024

gallilmaimon commented Oct 26, 2024

gallilmaimon commented Oct 30, 2024

avishaiElmakies commented Nov 13, 2024

gallilmaimon commented Nov 25, 2024

ylacombe left a comment • edited Loading

Choose a reason for hiding this comment

ylacombe commented Nov 26, 2024

ylacombe commented Nov 26, 2024

gallilmaimon commented Nov 26, 2024

gallilmaimon commented Dec 4, 2024

ylacombe commented Dec 5, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Dec 10, 2024

Choose a reason for hiding this comment

ArthurZucker commented Dec 10, 2024

gallilmaimon commented Dec 10, 2024

ylacombe commented Dec 10, 2024 • edited Loading

gallilmaimon commented Dec 10, 2024

ylacombe commented Dec 10, 2024

gallilmaimon commented Oct 24, 2024 •

edited

Loading

ylacombe left a comment •

edited

Loading

ylacombe commented Dec 10, 2024 •

edited

Loading