(False?) warning about weight_g/weight_v missing on WeightNorm on PyTorch #26796

sterlind · 2023-10-13T20:00:31Z

System Info

transformers version: 4.34.0
Platform: Linux-5.15.90.2-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.16.4
Safetensors version: 0.3.3
Accelerate version: 0.23.0
Accelerate config: not found
PyTorch version (GPU?): 2.2.0.dev20231005 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help?

@sanchit-gandhi

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Simply running:

from transformers import AutoProcessor, HubertModel
model = HubertModel.from_pretrained("facebook/hubert-base-ls960")

Produces the following warning:

Some weights of the model checkpoint at facebook/hubert-base-ls960 were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_v', 'encoder.pos_conv_embed.conv.weight_g']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at facebook/hubert-base-ls960 and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

What I gather from the PyTorch documentation
and updated code is that the PyTorch folks decided to migrate the weight_v and weight_g params of WeightNorm to original0 and original1.

Initially I thought the model was simply broken by this breaking change in PyTorch, however I was confused since I saw discussions that it should have been fixed by this PR in transformers, as discussed here: #24692

So I attached my debugger to _weight_norm_compat_hook, and sure enough it activated and seems to have migrated the state:
(during debug)

> state_dict[g_key]
tensor([[[0.3022, 0.1198, 0.1031, 0.1000, 0.0945, 0.0891, 0.0939, 0.0933, ...

(after model load, in Jupyter):

> model.encoder.pos_conv_embed.conv.parametrizations.weight.original0
Parameter containing:
tensor([[[0.3022, 0.1198, 0.1031, 0.1000, 0.0945, 0.0891, 0.0939, 0.0933, ...

So I'm pretty sure the warning is a false alarm, but I'm also confused since the migration happens before the warning is traced, so I wanted to check.

Expected behavior

No warning should have appeared.

The text was updated successfully, but these errors were encountered:

sanchit-gandhi · 2023-10-26T17:35:21Z

Hey @sterlind - sorry for the delay in getting back to you! You are indeed correct in that the warning shouldn't be triggered. The state dict is copied correctly with the PyTorch weight norm refactoring, but the warning thrown in from_pretrained since this hasn't yet been updated. I'll open a PR to fix this!

Sorrow321 · 2023-11-05T15:16:58Z

I see similar warning when importing Wav2Vec2.0: facebook/wav2vec2-base:

Some weights of ClassifierModel were not initialized from the model checkpoint at facebook/wav2vec2-base and are newly initialized: ['classifier.out_proj.weight', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'classifier.out_proj.bias', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1']

In the end it works correctly and I should just ignore the warning, right?

MorenoLaQuatra · 2023-12-11T11:38:57Z

Just to follow up on this, it may be related. When trying to convert a wav2vec2-conformer from the fairseq version to transformers I got an error with transformers version > 4.29.2 (4.29.2 works fine). I report the error below:

Traceback (most recent call last):
  File "MY_UTILITIES_PATH/convert_wav2vec2_conformer_original_pytorch_checkpoint_to_pytorch.py", line 308, in <module>
    convert_wav2vec2_conformer_checkpoint(
  File "MY_ENV_PATH//lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "MY_UTILITIES_PATH/convert_wav2vec2_conformer_original_pytorch_checkpoint_to_pytorch.py", line 293, in convert_wav2vec2_conformer_checkpoint
    recursively_load_weights(model, hf_wav2vec, not is_finetuned)
  File "MY_UTILITIES_PATH/convert_wav2vec2_conformer_original_pytorch_checkpoint_to_pytorch.py", line 167, in recursively_load_weights
    set_recursively(hf_model, mapped_key, value, name, weight_type)
  File "MY_UTILITIES_PATH/convert_wav2vec2_conformer_original_pytorch_checkpoint_to_pytorch.py", line 87, in set_recursively
    hf_shape = getattr(hf_pointer, weight_type).shape
  File "MY_ENV_PATH//lib/python3.9/site-packages/torch/nn/modules/module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'ParametrizedConv1d' object has no attribute 'weight_g'

github-actions · 2024-01-30T08:06:00Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ArthurZucker · 2024-01-30T10:14:22Z

cc @sanchit-gandhi we can close this now that the PR was merged no?

github-actions · 2024-02-24T08:05:46Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

YihaoJW · 2024-07-24T08:09:02Z

Is this been fixed? It still pops the same error when loading using

model = HubertModel.from_pretrained("facebook/hubert-base-ls960")

   ...: hubertModel = HubertModel.from_pretrained("facebook/hubert-large-ll60k").to("cuda")
Some weights of the model checkpoint at facebook/hubert-large-ll60k were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at facebook/hubert-large-ll60k and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

amyeroberts · 2024-07-24T09:25:04Z

Reopening as I can confirm this is still an issue on main cc @sanchit-gandhi @kamilakesbi

amyeroberts · 2024-08-19T08:31:27Z

cc @ylacombe

mzboito · 2024-09-13T12:53:12Z

Hi. I can confirm this is a fake warning.
I experienced it with torch>=2. Here is an example:

torch.version -> '1.13.1+cu117'
transformers.version -> '4.32.0'

model = HubertModel.from_pretrained("utter-project/mHuBERT-147")

no warning

torch.version -> '2.4.1+cu121'
transformers.version -> '4.44.2'

model = HubertModel.from_pretrained("utter-project/mHuBERT-147")
Some weights of the model checkpoint at utter-project/mHuBERT-147 were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at utter-project/mHuBERT-147 and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

As previously highlighted, the warning mentions these two apparently non-initialized layers:

problematic_1 = model.encoder.pos_conv_embed.conv.parametrizations.weight.original0
problematic_2 = model.encoder.pos_conv_embed.conv.parametrizations.weight.original1

However, loading the state_dict directly on torch and checking for the equivalent variables, we can verify they were correctly loaded.

state_dict = torch.load(PATH_TO_BIN)

original_1= state_dict['encoder.pos_conv_embed.conv.weight_g']
original_2= state_dict['encoder.pos_conv_embed.conv.weight_v']

all(torch.eq(problematic_1,original_1).tolist())
True
all(torch.eq(problematic_2,original_2).tolist())
True

I almost had a heart attack when I discovered this issue one day after the ICASSP deadline! x)

ylacombe · 2024-09-17T06:06:57Z

The issue should disappear with #33275, feel free to re-open the discussion or another issue if you're facing a similar issue!

gallilmaimon · 2024-10-14T19:23:22Z

If I understand correctly this naming change in PyTorch weight norm, from weight_v, weight_g to original1, original0 which causes this fake warning, is what also causes the fairseq to HF conversion script to crash as mentioned by @MorenoLaQuatra . I also faced the same issue trying to convert models from fairseq.

@ylacombe - should I create a PR to fix this? Should be quite easy by adding a special case in the conversion mapping here

transformers/src/transformers/models/hubert/convert_hubert_original_pytorch_checkpoint_to_pytorch.py

Line 111 in 53fad64

if "weight_g" in name:

Windval · 2024-11-14T06:23:44Z

From my end, the bug still exists, the warning disappeared in the new version but the state is not currently loaded which will print as all zeros for weight_v and weight_g

ylacombe · 2024-11-26T15:19:03Z

Hey @Windval, could you share a reproducer ?
Many thanks!

sanchit-gandhi mentioned this issue Dec 4, 2023

Wav2Vec2ForCTC architecture models load incorrectly with torch 2.1 and later #27605

Closed

4 tasks

huggingface deleted a comment from github-actions bot Dec 4, 2023

huggingface deleted a comment from github-actions bot Jan 5, 2024

Takaaki-Saeki mentioned this issue Jan 31, 2024

Fix version requirements of torch and transformer Takaaki-Saeki/DiscreteSpeechMetrics#1

Merged

ArthurZucker closed this as completed Feb 27, 2024

MahmoudAshraf97 mentioned this issue May 3, 2024

Wav2Vec2ForCTC weight mismatch #30628

Closed

4 tasks

jonnyli1125 mentioned this issue May 17, 2024

Wav2vec2 model has unknown attributes weight_g/weight_v when DeepSpeed ZeRO-3 is enabled #30881

Closed

4 tasks

amyeroberts reopened this Jul 24, 2024

kamilakesbi mentioned this issue Jul 24, 2024

warning about weight_g/weight_v missing on WeightNorm on PyTorch #32194

Closed

huggingface deleted a comment from github-actions bot Aug 19, 2024

amyeroberts added the Audio label Aug 19, 2024

This was referenced Sep 2, 2024

WavLM returns empty hidden states when loaded directly to GPU #31970

Closed

Fix parametrization-based weight norm #33275

Merged

huggingface deleted a comment from github-actions bot Sep 13, 2024

ylacombe closed this as completed Sep 17, 2024

subasine1 mentioned this issue Oct 10, 2024

Missing weights in model checkpoint #33995

Closed

4 tasks

gallilmaimon mentioned this issue Oct 24, 2024

Support BatchNorm in Hubert pos_conv_emb as in fairseq #34389

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(False?) warning about weight_g/weight_v missing on WeightNorm on PyTorch #26796

(False?) warning about weight_g/weight_v missing on WeightNorm on PyTorch #26796

sterlind commented Oct 13, 2023

sanchit-gandhi commented Oct 26, 2023

Sorrow321 commented Nov 5, 2023 •

edited

Loading

MorenoLaQuatra commented Dec 11, 2023

github-actions bot commented Jan 30, 2024

ArthurZucker commented Jan 30, 2024

github-actions bot commented Feb 24, 2024

YihaoJW commented Jul 24, 2024

amyeroberts commented Jul 24, 2024

amyeroberts commented Aug 19, 2024

mzboito commented Sep 13, 2024

ylacombe commented Sep 17, 2024

gallilmaimon commented Oct 14, 2024

Windval commented Nov 14, 2024

ylacombe commented Nov 26, 2024

(False?) warning about weight_g/weight_v missing on WeightNorm on PyTorch #26796

(False?) warning about weight_g/weight_v missing on WeightNorm on PyTorch #26796

Comments

sterlind commented Oct 13, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

sanchit-gandhi commented Oct 26, 2023

Sorrow321 commented Nov 5, 2023 • edited Loading

MorenoLaQuatra commented Dec 11, 2023

github-actions bot commented Jan 30, 2024

ArthurZucker commented Jan 30, 2024

github-actions bot commented Feb 24, 2024

YihaoJW commented Jul 24, 2024

amyeroberts commented Jul 24, 2024

amyeroberts commented Aug 19, 2024

mzboito commented Sep 13, 2024

ylacombe commented Sep 17, 2024

gallilmaimon commented Oct 14, 2024

Windval commented Nov 14, 2024

ylacombe commented Nov 26, 2024

Sorrow321 commented Nov 5, 2023 •

edited

Loading