Skip to content

PeftModel failing to load after finetuning. Size Mismatch Error #1710

@sunxiaojie99

Description

@sunxiaojie99

System Info

I trained the Mistral-7B-Instruct-v0-1 model using LoRa and deepspeed

  • Peft version : 0.10.0
  • transformers version : 4.40.0

Code to load model after fine-tuning:

base_model = cls.TRANSFORMER_CLS.from_pretrained(model_name_or_path, **hf_kwargs)
lora_config = LoraConfig.from_pretrained(lora_name_or_path, **hf_kwargs)
lora_model = PeftModel.from_pretrained(base_model, lora_name_or_path, config=lora_config)

Deepspeed config:

{
    "zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "none",
            "pin_memory": true
        },
        "offload_param": {
            "device": "none",
            "pin_memory": true
        },
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 1e9,
        "reduce_bucket_size": 1e6,
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "stage3_max_live_parameters": 1e9,
        "stage3_max_reuse_distance": 1e9,
        "stage3_gather_16bit_weights_on_model_save": true
    },
    "fp16": {
        "enabled": "auto",
        "loss_scale": 0,
        "initial_scale_power": 10,
        "loss_scale_window": 1000,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "bf16": {
        "enabled": "auto",
        "loss_scale": 0,
        "initial_scale_power": 10,
        "loss_scale_window": 1000,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "betas": "auto",
            "eps": "auto",
            "weight_decay": "auto",
            "torch_adam": true
        }
    },

    "scheduler": {
        "type": "WarmupDecayLR",
        "params": {
            "warmup_min_lr": "auto",
            "warmup_max_lr": "auto",
            "warmup_num_steps": "auto",
            "total_num_steps": "auto"
        }
    },

    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "steps_per_print": 1000,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

I am getting the following error:

File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 271, in from_pretrained
    model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 561, in load_adapter
    load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
  File "/usr/local/lib/python3.8/dist-packages/peft/utils/save_and_load.py", line 126, in set_peft_model_state_dict
    load_result = model.load_state_dict(peft_model_state_dict, strict=False)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 2189, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForFeatureExtraction:
        size mismatch for base_model.model.layers.0.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.0.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.0.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.1.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.1.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.1.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.2.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.2.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.2.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.3.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.3.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.3.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.4.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.4.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.4.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.5.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.5.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.5.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.6.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.6.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.6.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.7.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.7.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.7.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.8.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.8.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.8.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.9.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.9.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.9.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.10.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.10.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.10.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.11.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.11.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.11.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.12.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.12.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.12.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.13.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.13.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.13.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.14.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.14.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.14.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.15.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.15.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.15.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.16.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.16.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.16.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.17.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.17.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.17.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.18.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.18.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.18.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.19.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.19.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.19.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.20.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.20.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.20.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.21.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.21.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.21.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.22.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.22.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.22.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.23.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.23.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.23.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.24.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.24.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.24.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.25.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.25.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.25.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.26.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.26.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.26.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.27.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.27.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.27.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.28.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.28.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.28.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.29.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.29.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.29.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.30.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.30.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.30.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).
        size mismatch for base_model.model.layers.31.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.31.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 8]).
        size mismatch for base_model.model.layers.31.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([8, 14336]).

Who can help?

@younesbelkada @BenjaminBossa

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

Code to load model during fine-tuning:

base_model = AutoModel.from_pretrained(model_args.model_name_or_path, **hf_kwargs)
lora_config = LoraConfig(
                    base_model_name_or_path=model_args.model_name_or_path,
                    task_type="FEATURE_EXTRACTION",
                    r=model_args.lora_r,
                    lora_alpha=model_args.lora_alpha,
                    lora_dropout=model_args.lora_dropout,
                    target_modules=model_args.lora_target_modules.split(','),
                    inference_mode=False
                )
lora_model = get_peft_model(base_model, lora_config)

Expected behavior

Weights should load as expected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions