Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match #1

Open
salma-elshafey opened this issue May 21, 2022 · 0 comments
Labels
question Further information is requested

Comments

@salma-elshafey
Copy link

What is your question?

After pre-training the masked lm and the lm following the code in the github repo, I am trying to fuse them and fine-tune them together. However, I am getting these error/exception messages.

RuntimeError: Error(s) in loading state_dict for BridgeTransformerModel: size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]). size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]). size mismatch for decoder.lm_output_projection.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]).

Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match

Code

The fine-tuning code:
python3 Graformer/train.py data-bin-ar-en/ --task translation_multi_simple_epoch --langs 'ar,en' --lang-pairs 'ar-en' --decoder-langtok --lang-tok-replacing-bos-eos --arch bridge_transformer --encoder-layers 12 --decoder-layers 12 --no-encoder-attn-layers 0,1,2,3,4,5 --encoder-learned-pos --decoder-learned-pos --no-scale-embedding --encoder-normalize-before --decoder-normalize-before --activation-fn gelu --finetune-from-model masked_lm_checkpoints/checkpoint_last.pt,lm_checkpoints/checkpoint_last.pt --freeze-params "(.embed.)|(.layers\.(0|1|2|3|4|5)\..)|(.layers\.6\.self_attn_layer_norm.)" --transfer-params "encoder.layer_norm.weight:encoder.layers.6.self_attn_layer_norm.weight,decoder.layer_norm.weight:decoder.layers.6.self_attn_layer_norm.weight,encoder.layer_norm.bias:encoder.layers.6.self_attn_layer_norm.bias,decoder.layer_norm.bias:decoder.layers.6.self_attn_layer_norm.bias,decoder.embed_tokens.weight:decoder.lm_output_projection.weight,decoder.layer_norm.weight:decoder.lm_layer_norm.weight,decoder.layer_norm.bias:decoder.lm_layer_norm.bias" --lm-fusion --max-epoch 100 --max-tokens 16000 --optimizer adam --adam-betas '(0.9,0.98)' --lr 0.001 --warmup-updates 2500 --update-freq 5 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --dropout 0.1 --save-interval 5 --keep-interval-updates 5 --keep-best-checkpoints 1 --save-dir grafted-transformer-checkpoints --fp16 --disable-validation --ddp-backend=no_c10d

Note: The dictionary I pre-trained the models with is not exactly 64k in length.

What's your environment?

PyTorch Version: 1.11.0
OS (e.g., Linux): Linux
Python version: 3.8.10
GPU models and configuration: NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4

Note: I am working on only 1 GPU

@salma-elshafey salma-elshafey added the question Further information is requested label May 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant