Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match #1

salma-elshafey · 2022-05-21T10:57:56Z

What is your question?

After pre-training the masked lm and the lm following the code in the github repo, I am trying to fuse them and fine-tune them together. However, I am getting these error/exception messages.

RuntimeError: Error(s) in loading state_dict for BridgeTransformerModel: size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]). size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]). size mismatch for decoder.lm_output_projection.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]).

Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match

Code

The fine-tuning code:
python3 Graformer/train.py data-bin-ar-en/ --task translation_multi_simple_epoch --langs 'ar,en' --lang-pairs 'ar-en' --decoder-langtok --lang-tok-replacing-bos-eos --arch bridge_transformer --encoder-layers 12 --decoder-layers 12 --no-encoder-attn-layers 0,1,2,3,4,5 --encoder-learned-pos --decoder-learned-pos --no-scale-embedding --encoder-normalize-before --decoder-normalize-before --activation-fn gelu --finetune-from-model masked_lm_checkpoints/checkpoint_last.pt,lm_checkpoints/checkpoint_last.pt --freeze-params "(.embed.)|(.layers\.(0|1|2|3|4|5)\..)|(.layers\.6\.self_attn_layer_norm.)" --transfer-params "encoder.layer_norm.weight:encoder.layers.6.self_attn_layer_norm.weight,decoder.layer_norm.weight:decoder.layers.6.self_attn_layer_norm.weight,encoder.layer_norm.bias:encoder.layers.6.self_attn_layer_norm.bias,decoder.layer_norm.bias:decoder.layers.6.self_attn_layer_norm.bias,decoder.embed_tokens.weight:decoder.lm_output_projection.weight,decoder.layer_norm.weight:decoder.lm_layer_norm.weight,decoder.layer_norm.bias:decoder.lm_layer_norm.bias" --lm-fusion --max-epoch 100 --max-tokens 16000 --optimizer adam --adam-betas '(0.9,0.98)' --lr 0.001 --warmup-updates 2500 --update-freq 5 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --dropout 0.1 --save-interval 5 --keep-interval-updates 5 --keep-best-checkpoints 1 --save-dir grafted-transformer-checkpoints --fp16 --disable-validation --ddp-backend=no_c10d

Note: The dictionary I pre-trained the models with is not exactly 64k in length.

What's your environment?

PyTorch Version: 1.11.0
OS (e.g., Linux): Linux
Python version: 3.8.10
GPU models and configuration: NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4

Note: I am working on only 1 GPU

The text was updated successfully, but these errors were encountered:

salma-elshafey added the question Further information is requested label May 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match #1

Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match #1

salma-elshafey commented May 21, 2022

Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match #1

Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match #1

Comments

salma-elshafey commented May 21, 2022

What is your question?

Code

What's your environment?