You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After pre-training the masked lm and the lm following the code in the github repo, I am trying to fuse them and fine-tune them together. However, I am getting these error/exception messages.
RuntimeError: Error(s) in loading state_dict for BridgeTransformerModel: size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]). size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]). size mismatch for decoder.lm_output_projection.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]).
Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match
What is your question?
After pre-training the masked lm and the lm following the code in the github repo, I am trying to fuse them and fine-tune them together. However, I am getting these error/exception messages.
RuntimeError: Error(s) in loading state_dict for BridgeTransformerModel: size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]). size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]). size mismatch for decoder.lm_output_projection.weight: copying a param with shape torch.Size([63999, 1024]) from checkpoint, the shape in current model is torch.Size([64000, 1024]).
Exception: Cannot load parameters from checkpoint lm_checkpoints/checkpoint_last.pt; please ensure that the architectures match
Code
The fine-tuning code:
python3 Graformer/train.py data-bin-ar-en/ --task translation_multi_simple_epoch --langs 'ar,en' --lang-pairs 'ar-en' --decoder-langtok --lang-tok-replacing-bos-eos --arch bridge_transformer --encoder-layers 12 --decoder-layers 12 --no-encoder-attn-layers 0,1,2,3,4,5 --encoder-learned-pos --decoder-learned-pos --no-scale-embedding --encoder-normalize-before --decoder-normalize-before --activation-fn gelu --finetune-from-model masked_lm_checkpoints/checkpoint_last.pt,lm_checkpoints/checkpoint_last.pt --freeze-params "(.embed.)|(.layers\.(0|1|2|3|4|5)\..)|(.layers\.6\.self_attn_layer_norm.)" --transfer-params "encoder.layer_norm.weight:encoder.layers.6.self_attn_layer_norm.weight,decoder.layer_norm.weight:decoder.layers.6.self_attn_layer_norm.weight,encoder.layer_norm.bias:encoder.layers.6.self_attn_layer_norm.bias,decoder.layer_norm.bias:decoder.layers.6.self_attn_layer_norm.bias,decoder.embed_tokens.weight:decoder.lm_output_projection.weight,decoder.layer_norm.weight:decoder.lm_layer_norm.weight,decoder.layer_norm.bias:decoder.lm_layer_norm.bias" --lm-fusion --max-epoch 100 --max-tokens 16000 --optimizer adam --adam-betas '(0.9,0.98)' --lr 0.001 --warmup-updates 2500 --update-freq 5 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --dropout 0.1 --save-interval 5 --keep-interval-updates 5 --keep-best-checkpoints 1 --save-dir grafted-transformer-checkpoints --fp16 --disable-validation --ddp-backend=no_c10d
Note: The dictionary I pre-trained the models with is not exactly 64k in length.
What's your environment?
PyTorch Version: 1.11.0
OS (e.g., Linux): Linux
Python version: 3.8.10
GPU models and configuration: NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4
Note: I am working on only 1 GPU
The text was updated successfully, but these errors were encountered: