Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(False?) warning about weight_g/weight_v missing on WeightNorm on PyTorch #26796

Closed
4 tasks
sterlind opened this issue Oct 13, 2023 · 14 comments
Closed
4 tasks
Labels

Comments

@sterlind
Copy link

System Info

  • transformers version: 4.34.0
  • Platform: Linux-5.15.90.2-microsoft-standard-WSL2-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.16.4
  • Safetensors version: 0.3.3
  • Accelerate version: 0.23.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.0.dev20231005 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Who can help?

@sanchit-gandhi

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Simply running:

from transformers import AutoProcessor, HubertModel
model = HubertModel.from_pretrained("facebook/hubert-base-ls960")

Produces the following warning:

Some weights of the model checkpoint at facebook/hubert-base-ls960 were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_v', 'encoder.pos_conv_embed.conv.weight_g']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at facebook/hubert-base-ls960 and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

What I gather from the PyTorch documentation
and updated code is that the PyTorch folks decided to migrate the weight_v and weight_g params of WeightNorm to original0 and original1.

Initially I thought the model was simply broken by this breaking change in PyTorch, however I was confused since I saw discussions that it should have been fixed by this PR in transformers, as discussed here: #24692

So I attached my debugger to _weight_norm_compat_hook, and sure enough it activated and seems to have migrated the state:
(during debug)

> state_dict[g_key]
tensor([[[0.3022, 0.1198, 0.1031, 0.1000, 0.0945, 0.0891, 0.0939, 0.0933, ...

(after model load, in Jupyter):

> model.encoder.pos_conv_embed.conv.parametrizations.weight.original0
Parameter containing:
tensor([[[0.3022, 0.1198, 0.1031, 0.1000, 0.0945, 0.0891, 0.0939, 0.0933, ...

So I'm pretty sure the warning is a false alarm, but I'm also confused since the migration happens before the warning is traced, so I wanted to check.

Expected behavior

No warning should have appeared.

@sanchit-gandhi
Copy link
Contributor

Hey @sterlind - sorry for the delay in getting back to you! You are indeed correct in that the warning shouldn't be triggered. The state dict is copied correctly with the PyTorch weight norm refactoring, but the warning thrown in from_pretrained since this hasn't yet been updated. I'll open a PR to fix this!

@Sorrow321
Copy link
Contributor

Sorrow321 commented Nov 5, 2023

I see similar warning when importing Wav2Vec2.0: facebook/wav2vec2-base:

Some weights of ClassifierModel were not initialized from the model checkpoint at facebook/wav2vec2-base and are newly initialized: ['classifier.out_proj.weight', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'classifier.out_proj.bias', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1']

In the end it works correctly and I should just ignore the warning, right?

@MorenoLaQuatra
Copy link
Contributor

Just to follow up on this, it may be related. When trying to convert a wav2vec2-conformer from the fairseq version to transformers I got an error with transformers version > 4.29.2 (4.29.2 works fine). I report the error below:

Traceback (most recent call last):
  File "MY_UTILITIES_PATH/convert_wav2vec2_conformer_original_pytorch_checkpoint_to_pytorch.py", line 308, in <module>
    convert_wav2vec2_conformer_checkpoint(
  File "MY_ENV_PATH//lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "MY_UTILITIES_PATH/convert_wav2vec2_conformer_original_pytorch_checkpoint_to_pytorch.py", line 293, in convert_wav2vec2_conformer_checkpoint
    recursively_load_weights(model, hf_wav2vec, not is_finetuned)
  File "MY_UTILITIES_PATH/convert_wav2vec2_conformer_original_pytorch_checkpoint_to_pytorch.py", line 167, in recursively_load_weights
    set_recursively(hf_model, mapped_key, value, name, weight_type)
  File "MY_UTILITIES_PATH/convert_wav2vec2_conformer_original_pytorch_checkpoint_to_pytorch.py", line 87, in set_recursively
    hf_shape = getattr(hf_pointer, weight_type).shape
  File "MY_ENV_PATH//lib/python3.9/site-packages/torch/nn/modules/module.py", line 1695, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'ParametrizedConv1d' object has no attribute 'weight_g'

@huggingface huggingface deleted a comment from github-actions bot Jan 5, 2024
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@ArthurZucker
Copy link
Collaborator

cc @sanchit-gandhi we can close this now that the PR was merged no?

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@YihaoJW
Copy link

YihaoJW commented Jul 24, 2024

Is this been fixed? It still pops the same error when loading using

model = HubertModel.from_pretrained("facebook/hubert-base-ls960")
   ...: hubertModel = HubertModel.from_pretrained("facebook/hubert-large-ll60k").to("cuda")
Some weights of the model checkpoint at facebook/hubert-large-ll60k were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at facebook/hubert-large-ll60k and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

@amyeroberts
Copy link
Collaborator

Reopening as I can confirm this is still an issue on main cc @sanchit-gandhi @kamilakesbi

@amyeroberts
Copy link
Collaborator

cc @ylacombe

@mzboito
Copy link

mzboito commented Sep 13, 2024

Hi. I can confirm this is a fake warning.
I experienced it with torch>=2. Here is an example:

torch.version -> '1.13.1+cu117'
transformers.version -> '4.32.0'

model = HubertModel.from_pretrained("utter-project/mHuBERT-147")

no warning

torch.version -> '2.4.1+cu121'
transformers.version -> '4.44.2'

model = HubertModel.from_pretrained("utter-project/mHuBERT-147")
Some weights of the model checkpoint at utter-project/mHuBERT-147 were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at utter-project/mHuBERT-147 and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

As previously highlighted, the warning mentions these two apparently non-initialized layers:

problematic_1 = model.encoder.pos_conv_embed.conv.parametrizations.weight.original0
problematic_2 = model.encoder.pos_conv_embed.conv.parametrizations.weight.original1

However, loading the state_dict directly on torch and checking for the equivalent variables, we can verify they were correctly loaded.

state_dict = torch.load(PATH_TO_BIN)

original_1= state_dict['encoder.pos_conv_embed.conv.weight_g']
original_2= state_dict['encoder.pos_conv_embed.conv.weight_v']

all(torch.eq(problematic_1,original_1).tolist())
True
all(torch.eq(problematic_2,original_2).tolist())
True

I almost had a heart attack when I discovered this issue one day after the ICASSP deadline! x)

@ylacombe
Copy link
Contributor

The issue should disappear with #33275, feel free to re-open the discussion or another issue if you're facing a similar issue!

@gallilmaimon
Copy link
Contributor

If I understand correctly this naming change in PyTorch weight norm, from weight_v, weight_g to original1, original0 which causes this fake warning, is what also causes the fairseq to HF conversion script to crash as mentioned by @MorenoLaQuatra . I also faced the same issue trying to convert models from fairseq.

@ylacombe - should I create a PR to fix this? Should be quite easy by adding a special case in the conversion mapping here

@Windval
Copy link

Windval commented Nov 14, 2024

From my end, the bug still exists, the warning disappeared in the new version but the state is not currently loaded which will print as all zeros for weight_v and weight_g

@ylacombe
Copy link
Contributor

Hey @Windval, could you share a reproducer ?
Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests