You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Training for 5 steps only, saving at steps 2 for testing purpose.
Remove --overwrite_output_dir and start training again, after it loaded almost everything seem fine, right beforit posed an error os missing dick keys:
Tried everytime. Same dataset, same script, even able to merge into 7B base OK after training so the script works but CHECKPOINT saving must be wrong somewhere.
Any idea ?
Steve
The text was updated successfully, but these errors were encountered:
I think you have checked #464
Here is the detailed explanation:
Since only the LoRA part (and embed_tokens and lm_head) of the full model are saved to the ckpt, when resuming training and loading the saved ckpt, we must allow missing keys.
When using Transformers with Deepspeed, Transformers doesn't provide any parameter to set allowing missing keys. See here
Therefore we have to modify the source code.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.
Describe the issue in detail
Training for 5 steps only, saving at steps 2 for testing purpose.
Remove --overwrite_output_dir and start training again, after it loaded almost everything seem fine, right beforit posed an error os missing dick keys:
Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.00039505958557128906 seconds
[INFO|deepspeed.py:390] 2023-06-10 18:22:44,370 >> Attempting to resume from /content/drive/MyDrive/IMPORTANTS/LLM/Pre-train/VN-LLaMA-Lora-V1.0-PreTrain_02/checkpoint-2
[2023-06-10 18:22:44,707] [INFO] [torch_checkpoint_engine.py:27:load] [Torch] Loading checkpoint from /content/drive/MyDrive/IMPORTANTS/LLM/Pre-train/VN-LLaMA-Lora-V1.0-PreTrain_02/checkpoint-2/global_step2/mp_rank_00_model_states.pt...
[2023-06-10 18:24:21,873] [INFO] [torch_checkpoint_engine.py:29:load] [Torch] Loaded checkpoint from /content/drive/MyDrive/IMPORTANTS/LLM/Pre-train/VN-LLaMA-Lora-V1.0-PreTrain_02/checkpoint-2/global_step2/mp_rank_00_model_states.pt.
[2023-06-10 18:24:22,705] [INFO] [torch_checkpoint_engine.py:27:load] [Torch] Loading checkpoint from /content/drive/MyDrive/IMPORTANTS/LLM/Pre-train/VN-LLaMA-Lora-V1.0-PreTrain_02/checkpoint-2/global_step2/mp_rank_00_model_states.pt...
[2023-06-10 18:24:35,849] [INFO] [torch_checkpoint_engine.py:29:load] [Torch] Loaded checkpoint from /content/drive/MyDrive/IMPORTANTS/LLM/Pre-train/VN-LLaMA-Lora-V1.0-PreTrain_02/checkpoint-2/global_step2/mp_rank_00_model_states.pt.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/drive/MyDrive/IMPORTANTS/LLM/Chinese-LLaMA-Alpaca/scripts/training/ │
│ run_clm_pt_with_peft.py:635 in │
│ │
│ 632 │
│ 633 │
│ 634 if name == "main": │
│ ❱ 635 │ main() │
│ 636 │
│ │
│ /content/drive/MyDrive/IMPORTANTS/LLM/Chinese-LLaMA-Alpaca/scripts/training/ │
│ run_clm_pt_with_peft.py:587 in main │
│ │
│ 584 │ │ │ checkpoint = training_args.resume_from_checkpoint │
│ 585 │ │ elif last_checkpoint is not None: │
│ 586 │ │ │ checkpoint = last_checkpoint │
│ ❱ 587 │ │ train_result = trainer.train(resume_from_checkpoint=checkpoint │
│ 588 │ │ trainer.save_model() │
│ 589 │ │ │
│ 590 │ │ metrics = train_result.metrics │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1662 in │
│ train │
│ │
│ 1659 │ │ inner_training_loop = find_executable_batch_size( │
│ 1660 │ │ │ self._inner_training_loop, self._train_batch_size, args.a │
│ 1661 │ │ ) │
│ ❱ 1662 │ │ return inner_training_loop( │
│ 1663 │ │ │ args=args, │
│ 1664 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1665 │ │ │ trial=trial, │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1731 in │
│ _inner_training_loop │
│ │
│ 1728 │ │ │ or self.fsdp is not None │
│ 1729 │ │ ) │
│ 1730 │ │ if args.deepspeed: │
│ ❱ 1731 │ │ │ deepspeed_engine, optimizer, lr_scheduler = deepspeed_ini │
│ 1732 │ │ │ │ self, num_training_steps=max_steps, resume_from_check │
│ 1733 │ │ │ ) │
│ 1734 │ │ │ self.model = deepspeed_engine.module │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:392 in │
│ deepspeed_init │
│ │
│ 389 │ │ if len(deepspeed_checkpoint_dirs) > 0: │
│ 390 │ │ │ logger.info(f"Attempting to resume from {resume_from_check │
│ 391 │ │ │ # this magically updates self.optimizer and self.lr_schedu │
│ ❱ 392 │ │ │ load_path, _ = deepspeed_engine.load_checkpoint( │
│ 393 │ │ │ │ resume_from_checkpoint, load_optimizer_states=True, lo │
│ 394 │ │ │ ) │
│ 395 │ │ │ if load_path is None: │
│ │
│ /usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py:2605 in │
│ load_checkpoint │
│ │
│ 2602 │ │ │ # Prepare for checkpoint load by ensuring all parameters │
│ 2603 │ │ │ self.optimizer.checkpoint_event_prologue() │
│ 2604 │ │ │
│ ❱ 2605 │ │ load_path, client_states = self.load_checkpoint(load_dir, │
│ 2606 │ │ │ │ │ │ │ │ │ │ │ │ │ │ tag, │
│ 2607 │ │ │ │ │ │ │ │ │ │ │ │ │ │ load_module │
│ 2608 │ │ │ │ │ │ │ │ │ │ │ │ │ │ load_optimiz │
│ │
│ /usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py:2664 in │
│ load_checkpoint │
│ │
│ 2661 │ │ │ │ │ │ │ │ │ │ │ │ num_experts=self.num │
│ 2662 │ │ │ │ │ │ │ │ │ │ │ │ checkpoint_engine=sel │
│ 2663 │ │ if not self.load_universal_checkpoint(): │
│ ❱ 2664 │ │ │ self.load_module_state_dict(checkpoint=checkpoint, │
│ 2665 │ │ │ │ │ │ │ │ │ │ strict=load_module_strict, │
│ 2666 │ │ │ │ │ │ │ │ │ │ custom_load_fn=custom_load_fn │
│ 2667 │
│ │
│ /usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py:2468 in │
│ load_module_state_dict │
│ │
│ 2465 │ │ if custom_load_fn: │
│ 2466 │ │ │ custom_load_fn(src=module_state_dict, dst=self.module) │
│ 2467 │ │ else: │
│ ❱ 2468 │ │ │ self.module.load_state_dict( │
│ 2469 │ │ │ │ module_state_dict, # TODO │
│ 2470 │ │ │ │ strict=strict) │
│ 2471 │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1671 in │
│ load_state_dict │
│ │
│ 1668 │ │ │ │ │ │ ', '.join('"{}"'.format(k) for k in missing_k │
│ 1669 │ │ │
│ 1670 │ │ if len(error_msgs) > 0: │
│ ❱ 1671 │ │ │ raise RuntimeError('Error(s) in loading state_dict for {} │
│ 1672 │ │ │ │ │ │ │ self.class.name, "\n\t".join(e │
│ 1673 │ │ return _IncompatibleKeys(missing_keys, unexpected_keys) │
│ 1674 │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
Missing key(s) in state_dict:
"base_model.model.model.layers.0.self_attn.q_proj.weight",
"base_model.model.model.layers.0.self_attn.k_proj.weight",
"base_model.model.model.layers.0.self_attn.v_proj.weight",
"base_model.model.model.layers.0.self_attn.o_proj.weight",
"base_model.model.model.layers.0.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.0.mlp.gate_proj.weight",
"base_model.model.model.layers.0.mlp.down_proj.weight",
"base_model.model.model.layers.0.mlp.up_proj.weight",
"base_model.model.model.layers.0.input_layernorm.weight",
"base_model.model.model.layers.0.post_attention_layernorm.weight",
"base_model.model.model.layers.1.self_attn.q_proj.weight",
"base_model.model.model.layers.1.self_attn.k_proj.weight",
"base_model.model.model.layers.1.self_attn.v_proj.weight",
"base_model.model.model.layers.1.self_attn.o_proj.weight",
"base_model.model.model.layers.1.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.1.mlp.gate_proj.weight",
"base_model.model.model.layers.1.mlp.down_proj.weight",
"base_model.model.model.layers.1.mlp.up_proj.weight",
"base_model.model.model.layers.1.input_layernorm.weight",
"base_model.model.model.layers.1.post_attention_layernorm.weight",
"base_model.model.model.layers.2.self_attn.q_proj.weight",
"base_model.model.model.layers.2.self_attn.k_proj.weight",
"base_model.model.model.layers.2.self_attn.v_proj.weight",
"base_model.model.model.layers.2.self_attn.o_proj.weight",
"base_model.model.model.layers.2.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.2.mlp.gate_proj.weight",
"base_model.model.model.layers.2.mlp.down_proj.weight",
"base_model.model.model.layers.2.mlp.up_proj.weight",
"base_model.model.model.layers.2.input_layernorm.weight",
"base_model.model.model.layers.2.post_attention_layernorm.weight",
"base_model.model.model.layers.3.self_attn.q_proj.weight",
"base_model.model.model.layers.3.self_attn.k_proj.weight",
"base_model.model.model.layers.3.self_attn.v_proj.weight",
"base_model.model.model.layers.3.self_attn.o_proj.weight",
"base_model.model.model.layers.3.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.3.mlp.gate_proj.weight",
"base_model.model.model.layers.3.mlp.down_proj.weight",
"base_model.model.model.layers.3.mlp.up_proj.weight",
"base_model.model.model.layers.3.input_layernorm.weight",
"base_model.model.model.layers.3.post_attention_layernorm.weight",
"base_model.model.model.layers.4.self_attn.q_proj.weight",
"base_model.model.model.layers.4.self_attn.k_proj.weight",
"base_model.model.model.layers.4.self_attn.v_proj.weight",
"base_model.model.model.layers.4.self_attn.o_proj.weight",
"base_model.model.model.layers.4.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.4.mlp.gate_proj.weight",
"base_model.model.model.layers.4.mlp.down_proj.weight",
"base_model.model.model.layers.4.mlp.up_proj.weight",
"base_model.model.model.layers.4.input_layernorm.weight",
"base_model.model.model.layers.4.post_attention_layernorm.weight",
"base_model.model.model.layers.5.self_attn.q_proj.weight",
"base_model.model.model.layers.5.self_attn.k_proj.weight",
"base_model.model.model.layers.5.self_attn.v_proj.weight",
"base_model.model.model.layers.5.self_attn.o_proj.weight",
"base_model.model.model.layers.5.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.5.mlp.gate_proj.weight",
"base_model.model.model.layers.5.mlp.down_proj.weight",
"base_model.model.model.layers.5.mlp.up_proj.weight",
"base_model.model.model.layers.5.input_layernorm.weight",
"base_model.model.model.layers.5.post_attention_layernorm.weight",
"base_model.model.model.layers.6.self_attn.q_proj.weight",
"base_model.model.model.layers.6.self_attn.k_proj.weight",
"base_model.model.model.layers.6.self_attn.v_proj.weight",
"base_model.model.model.layers.6.self_attn.o_proj.weight",
"base_model.model.model.layers.6.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.6.mlp.gate_proj.weight",
"base_model.model.model.layers.6.mlp.down_proj.weight",
"base_model.model.model.layers.6.mlp.up_proj.weight",
"base_model.model.model.layers.6.input_layernorm.weight",
"base_model.model.model.layers.6.post_attention_layernorm.weight",
"base_model.model.model.layers.7.self_attn.q_proj.weight",
"base_model.model.model.layers.7.self_attn.k_proj.weight",
"base_model.model.model.layers.7.self_attn.v_proj.weight",
"base_model.model.model.layers.7.self_attn.o_proj.weight",
"base_model.model.model.layers.7.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.7.mlp.gate_proj.weight",
"base_model.model.model.layers.7.mlp.down_proj.weight",
"base_model.model.model.layers.7.mlp.up_proj.weight",
"base_model.model.model.layers.7.input_layernorm.weight",
"base_model.model.model.layers.7.post_attention_layernorm.weight",
"base_model.model.model.layers.8.self_attn.q_proj.weight",
"base_model.model.model.layers.8.self_attn.k_proj.weight",
"base_model.model.model.layers.8.self_attn.v_proj.weight",
"base_model.model.model.layers.8.self_attn.o_proj.weight",
"base_model.model.model.layers.8.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.8.mlp.gate_proj.weight",
"base_model.model.model.layers.8.mlp.down_proj.weight",
"base_model.model.model.layers.8.mlp.up_proj.weight",
"base_model.model.model.layers.8.input_layernorm.weight",
"base_model.model.model.layers.8.post_attention_layernorm.weight",
"base_model.model.model.layers.9.self_attn.q_proj.weight",
"base_model.model.model.layers.9.self_attn.k_proj.weight",
"base_model.model.model.layers.9.self_attn.v_proj.weight",
"base_model.model.model.layers.9.self_attn.o_proj.weight",
"base_model.model.model.layers.9.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.9.mlp.gate_proj.weight",
"base_model.model.model.layers.9.mlp.down_proj.weight",
"base_model.model.model.layers.9.mlp.up_proj.weight",
"base_model.model.model.layers.9.input_layernorm.weight",
"base_model.model.model.layers.9.post_attention_layernorm.weight",
"base_model.model.model.layers.10.self_attn.q_proj.weight",
"base_model.model.model.layers.10.self_attn.k_proj.weight",
"base_model.model.model.layers.10.self_attn.v_proj.weight",
"base_model.model.model.layers.10.self_attn.o_proj.weight",
"base_model.model.model.layers.10.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.10.mlp.gate_proj.weight",
"base_model.model.model.layers.10.mlp.down_proj.weight",
"base_model.model.model.layers.10.mlp.up_proj.weight",
"base_model.model.model.layers.10.input_layernorm.weight",
"base_model.model.model.layers.10.post_attention_layernorm.weight",
"base_model.model.model.layers.11.self_attn.q_proj.weight",
"base_model.model.model.layers.11.self_attn.k_proj.weight",
"base_model.model.model.layers.11.self_attn.v_proj.weight",
"base_model.model.model.layers.11.self_attn.o_proj.weight",
"base_model.model.model.layers.11.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.11.mlp.gate_proj.weight",
"base_model.model.model.layers.11.mlp.down_proj.weight",
"base_model.model.model.layers.11.mlp.up_proj.weight",
"base_model.model.model.layers.11.input_layernorm.weight",
"base_model.model.model.layers.11.post_attention_layernorm.weight",
"base_model.model.model.layers.12.self_attn.q_proj.weight",
"base_model.model.model.layers.12.self_attn.k_proj.weight",
"base_model.model.model.layers.12.self_attn.v_proj.weight",
"base_model.model.model.layers.12.self_attn.o_proj.weight",
"base_model.model.model.layers.12.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.12.mlp.gate_proj.weight",
"base_model.model.model.layers.12.mlp.down_proj.weight",
"base_model.model.model.layers.12.mlp.up_proj.weight",
"base_model.model.model.layers.12.input_layernorm.weight",
"base_model.model.model.layers.12.post_attention_layernorm.weight",
"base_model.model.model.layers.13.self_attn.q_proj.weight",
"base_model.model.model.layers.13.self_attn.k_proj.weight",
"base_model.model.model.layers.13.self_attn.v_proj.weight",
"base_model.model.model.layers.13.self_attn.o_proj.weight",
"base_model.model.model.layers.13.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.13.mlp.gate_proj.weight",
"base_model.model.model.layers.13.mlp.down_proj.weight",
"base_model.model.model.layers.13.mlp.up_proj.weight",
"base_model.model.model.layers.13.input_layernorm.weight",
"base_model.model.model.layers.13.post_attention_layernorm.weight",
"base_model.model.model.layers.14.self_attn.q_proj.weight",
"base_model.model.model.layers.14.self_attn.k_proj.weight",
"base_model.model.model.layers.14.self_attn.v_proj.weight",
"base_model.model.model.layers.14.self_attn.o_proj.weight",
"base_model.model.model.layers.14.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.14.mlp.gate_proj.weight",
"base_model.model.model.layers.14.mlp.down_proj.weight",
"base_model.model.model.layers.14.mlp.up_proj.weight",
"base_model.model.model.layers.14.input_layernorm.weight",
"base_model.model.model.layers.14.post_attention_layernorm.weight",
"base_model.model.model.layers.15.self_attn.q_proj.weight",
"base_model.model.model.layers.15.self_attn.k_proj.weight",
"base_model.model.model.layers.15.self_attn.v_proj.weight",
"base_model.model.model.layers.15.self_attn.o_proj.weight",
"base_model.model.model.layers.15.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.15.mlp.gate_proj.weight",
"base_model.model.model.layers.15.mlp.down_proj.weight",
"base_model.model.model.layers.15.mlp.up_proj.weight",
"base_model.model.model.layers.15.input_layernorm.weight",
"base_model.model.model.layers.15.post_attention_layernorm.weight",
"base_model.model.model.layers.16.self_attn.q_proj.weight",
"base_model.model.model.layers.16.self_attn.k_proj.weight",
"base_model.model.model.layers.16.self_attn.v_proj.weight",
"base_model.model.model.layers.16.self_attn.o_proj.weight",
"base_model.model.model.layers.16.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.16.mlp.gate_proj.weight",
"base_model.model.model.layers.16.mlp.down_proj.weight",
"base_model.model.model.layers.16.mlp.up_proj.weight",
"base_model.model.model.layers.16.input_layernorm.weight",
"base_model.model.model.layers.16.post_attention_layernorm.weight",
"base_model.model.model.layers.17.self_attn.q_proj.weight",
"base_model.model.model.layers.17.self_attn.k_proj.weight",
"base_model.model.model.layers.17.self_attn.v_proj.weight",
"base_model.model.model.layers.17.self_attn.o_proj.weight",
"base_model.model.model.layers.17.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.17.mlp.gate_proj.weight",
"base_model.model.model.layers.17.mlp.down_proj.weight",
"base_model.model.model.layers.17.mlp.up_proj.weight",
"base_model.model.model.layers.17.input_layernorm.weight",
"base_model.model.model.layers.17.post_attention_layernorm.weight",
"base_model.model.model.layers.18.self_attn.q_proj.weight",
"base_model.model.model.layers.18.self_attn.k_proj.weight",
"base_model.model.model.layers.18.self_attn.v_proj.weight",
"base_model.model.model.layers.18.self_attn.o_proj.weight",
"base_model.model.model.layers.18.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.18.mlp.gate_proj.weight",
"base_model.model.model.layers.18.mlp.down_proj.weight",
"base_model.model.model.layers.18.mlp.up_proj.weight",
"base_model.model.model.layers.18.input_layernorm.weight",
"base_model.model.model.layers.18.post_attention_layernorm.weight",
"base_model.model.model.layers.19.self_attn.q_proj.weight",
"base_model.model.model.layers.19.self_attn.k_proj.weight",
"base_model.model.model.layers.19.self_attn.v_proj.weight",
"base_model.model.model.layers.19.self_attn.o_proj.weight",
"base_model.model.model.layers.19.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.19.mlp.gate_proj.weight",
"base_model.model.model.layers.19.mlp.down_proj.weight",
"base_model.model.model.layers.19.mlp.up_proj.weight",
"base_model.model.model.layers.19.input_layernorm.weight",
"base_model.model.model.layers.19.post_attention_layernorm.weight",
"base_model.model.model.layers.20.self_attn.q_proj.weight",
"base_model.model.model.layers.20.self_attn.k_proj.weight",
"base_model.model.model.layers.20.self_attn.v_proj.weight",
"base_model.model.model.layers.20.self_attn.o_proj.weight",
"base_model.model.model.layers.20.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.20.mlp.gate_proj.weight",
"base_model.model.model.layers.20.mlp.down_proj.weight",
"base_model.model.model.layers.20.mlp.up_proj.weight",
"base_model.model.model.layers.20.input_layernorm.weight",
"base_model.model.model.layers.20.post_attention_layernorm.weight",
"base_model.model.model.layers.21.self_attn.q_proj.weight",
"base_model.model.model.layers.21.self_attn.k_proj.weight",
"base_model.model.model.layers.21.self_attn.v_proj.weight",
"base_model.model.model.layers.21.self_attn.o_proj.weight",
"base_model.model.model.layers.21.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.21.mlp.gate_proj.weight",
"base_model.model.model.layers.21.mlp.down_proj.weight",
"base_model.model.model.layers.21.mlp.up_proj.weight",
"base_model.model.model.layers.21.input_layernorm.weight",
"base_model.model.model.layers.21.post_attention_layernorm.weight",
"base_model.model.model.layers.22.self_attn.q_proj.weight",
"base_model.model.model.layers.22.self_attn.k_proj.weight",
"base_model.model.model.layers.22.self_attn.v_proj.weight",
"base_model.model.model.layers.22.self_attn.o_proj.weight",
"base_model.model.model.layers.22.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.22.mlp.gate_proj.weight",
"base_model.model.model.layers.22.mlp.down_proj.weight",
"base_model.model.model.layers.22.mlp.up_proj.weight",
"base_model.model.model.layers.22.input_layernorm.weight",
"base_model.model.model.layers.22.post_attention_layernorm.weight",
"base_model.model.model.layers.23.self_attn.q_proj.weight",
"base_model.model.model.layers.23.self_attn.k_proj.weight",
"base_model.model.model.layers.23.self_attn.v_proj.weight",
"base_model.model.model.layers.23.self_attn.o_proj.weight",
"base_model.model.model.layers.23.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.23.mlp.gate_proj.weight",
"base_model.model.model.layers.23.mlp.down_proj.weight",
"base_model.model.model.layers.23.mlp.up_proj.weight",
"base_model.model.model.layers.23.input_layernorm.weight",
"base_model.model.model.layers.23.post_attention_layernorm.weight",
"base_model.model.model.layers.24.self_attn.q_proj.weight",
"base_model.model.model.layers.24.self_attn.k_proj.weight",
"base_model.model.model.layers.24.self_attn.v_proj.weight",
"base_model.model.model.layers.24.self_attn.o_proj.weight",
"base_model.model.model.layers.24.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.24.mlp.gate_proj.weight",
"base_model.model.model.layers.24.mlp.down_proj.weight",
"base_model.model.model.layers.24.mlp.up_proj.weight",
"base_model.model.model.layers.24.input_layernorm.weight",
"base_model.model.model.layers.24.post_attention_layernorm.weight",
"base_model.model.model.layers.25.self_attn.q_proj.weight",
"base_model.model.model.layers.25.self_attn.k_proj.weight",
"base_model.model.model.layers.25.self_attn.v_proj.weight",
"base_model.model.model.layers.25.self_attn.o_proj.weight",
"base_model.model.model.layers.25.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.25.mlp.gate_proj.weight",
"base_model.model.model.layers.25.mlp.down_proj.weight",
"base_model.model.model.layers.25.mlp.up_proj.weight",
"base_model.model.model.layers.25.input_layernorm.weight",
"base_model.model.model.layers.25.post_attention_layernorm.weight",
"base_model.model.model.layers.26.self_attn.q_proj.weight",
"base_model.model.model.layers.26.self_attn.k_proj.weight",
"base_model.model.model.layers.26.self_attn.v_proj.weight",
"base_model.model.model.layers.26.self_attn.o_proj.weight",
"base_model.model.model.layers.26.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.26.mlp.gate_proj.weight",
"base_model.model.model.layers.26.mlp.down_proj.weight",
"base_model.model.model.layers.26.mlp.up_proj.weight",
"base_model.model.model.layers.26.input_layernorm.weight",
"base_model.model.model.layers.26.post_attention_layernorm.weight",
"base_model.model.model.layers.27.self_attn.q_proj.weight",
"base_model.model.model.layers.27.self_attn.k_proj.weight",
"base_model.model.model.layers.27.self_attn.v_proj.weight",
"base_model.model.model.layers.27.self_attn.o_proj.weight",
"base_model.model.model.layers.27.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.27.mlp.gate_proj.weight",
"base_model.model.model.layers.27.mlp.down_proj.weight",
"base_model.model.model.layers.27.mlp.up_proj.weight",
"base_model.model.model.layers.27.input_layernorm.weight",
"base_model.model.model.layers.27.post_attention_layernorm.weight",
"base_model.model.model.layers.28.self_attn.q_proj.weight",
"base_model.model.model.layers.28.self_attn.k_proj.weight",
"base_model.model.model.layers.28.self_attn.v_proj.weight",
"base_model.model.model.layers.28.self_attn.o_proj.weight",
"base_model.model.model.layers.28.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.28.mlp.gate_proj.weight",
"base_model.model.model.layers.28.mlp.down_proj.weight",
"base_model.model.model.layers.28.mlp.up_proj.weight",
"base_model.model.model.layers.28.input_layernorm.weight",
"base_model.model.model.layers.28.post_attention_layernorm.weight",
"base_model.model.model.layers.29.self_attn.q_proj.weight",
"base_model.model.model.layers.29.self_attn.k_proj.weight",
"base_model.model.model.layers.29.self_attn.v_proj.weight",
"base_model.model.model.layers.29.self_attn.o_proj.weight",
"base_model.model.model.layers.29.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.29.mlp.gate_proj.weight",
"base_model.model.model.layers.29.mlp.down_proj.weight",
"base_model.model.model.layers.29.mlp.up_proj.weight",
"base_model.model.model.layers.29.input_layernorm.weight",
"base_model.model.model.layers.29.post_attention_layernorm.weight",
"base_model.model.model.layers.30.self_attn.q_proj.weight",
"base_model.model.model.layers.30.self_attn.k_proj.weight",
"base_model.model.model.layers.30.self_attn.v_proj.weight",
"base_model.model.model.layers.30.self_attn.o_proj.weight",
"base_model.model.model.layers.30.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.30.mlp.gate_proj.weight",
"base_model.model.model.layers.30.mlp.down_proj.weight",
"base_model.model.model.layers.30.mlp.up_proj.weight",
"base_model.model.model.layers.30.input_layernorm.weight",
"base_model.model.model.layers.30.post_attention_layernorm.weight",
"base_model.model.model.layers.31.self_attn.q_proj.weight",
"base_model.model.model.layers.31.self_attn.k_proj.weight",
"base_model.model.model.layers.31.self_attn.v_proj.weight",
"base_model.model.model.layers.31.self_attn.o_proj.weight",
"base_model.model.model.layers.31.self_attn.rotary_emb.inv_freq",
"base_model.model.model.layers.31.mlp.gate_proj.weight",
"base_model.model.model.layers.31.mlp.down_proj.weight",
"base_model.model.model.layers.31.mlp.up_proj.weight",
"base_model.model.model.layers.31.input_layernorm.weight",
"base_model.model.model.layers.31.post_attention_layernorm.weight",
"base_model.model.model.norm.weight".
Tried everytime. Same dataset, same script, even able to merge into 7B base OK after training so the script works but CHECKPOINT saving must be wrong somewhere.
Any idea ?
Steve
The text was updated successfully, but these errors were encountered: