How to load the model and the checkpoint after trained the model? #674

ccwdb · 2023-08-22T10:31:01Z

I trained my model using the code in the sft_trainer.py. And I save the checkpoint and the model in the same dir.
But I don't know how to load the model with the checkpoint. Or I just want to konw that trainer.save_model(script_args.output_dir) means I have save a trained model, not just a checkpoint?
I try many ways to load the trained model but errors like

RuntimeError: Error(s) in loading state_dict for PrefixEncoder:
	Missing key(s) in state_dict: "embedding.weight".

So, how to load the model???

The text was updated successfully, but these errors were encountered:

lvwerra · 2023-08-23T08:57:47Z

Are you using PEFT for fine-tuning? And can you the code you are using to load the model?

ccwdb · 2023-08-25T08:54:05Z

Are you using PEFT for fine-tuning? And can you the code you are using to load the model?

Following are my code.

from datasets import load_dataset
from trl import SFTTrainer
from transformers import AutoModel, DataCollatorForLanguageModeling, AutoTokenizer, TrainingArguments, AutoModelForCausalLM
from peft import LoraConfig

# 加载模型和tokenizer
MODEL_PATH = "/home/qiji/chatglm2-6b"
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, trust_remote_code=True).half().cuda()
# model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
# tokenizer.padding_side = 'right'
# 设置微调参数
training_arguments = TrainingArguments(
    output_dir='/home/qiji/Container/jinkundong/SFT/results',
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    save_steps=5000,
    logging_steps=1000,
    learning_rate=2e-4,
    fp16=True,
    max_grad_norm=0.3,
    max_steps=5000,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type='constant',
)
model.config.use_cache = False

peft_config = LoraConfig(
    r=64,
    lora_alpha=16,
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM",
)

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,
)
no_deprecation_warning=True
dataset = load_dataset("/home/qiji/Container/jinkundong/SFT/SFT_dataset", split="train")
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    dataset_text_field="input",
    max_seq_length=512,
    peft_config=peft_config,
    args=training_arguments,
    data_collator=data_collator,
    packing=False,
)

trainer.train()

model_save_path = "/home/qiji/Container/jinkundong/SFT_2"
trainer.save_model(model_save_path)

As I konw, output_dir is the path that save the checkpoint and output_dir save the model. But I don't know what they can do .

younesbelkada · 2023-08-28T08:00:23Z

hi @ccwdb
Can you try to run (after training)

import torch
from peft import AutoPeftModelForCausalLM

model = AutoPeftModelForCausalLM.from_pretrained(output_dir, torch_dtype=torch.float16)

Make sure to install peft>=0.4.0

ccwdb · 2023-09-01T07:15:14Z

hi @ccwdb Can you try to run (after training)
import torch
from peft import AutoPeftModelForCausalLM

model = AutoPeftModelForCausalLM.from_pretrained(output_dir, torch_dtype=torch.float16)
Make sure to install peft>=0.4.0

I tried your code, but it says that I didn't have a file named config.json.
Here is my file.

I don't konw the difference between pytorch_model,bin and adapter_model.bin

github-actions · 2023-09-25T15:05:23Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

louieworth · 2023-11-27T21:34:29Z

After checking the source code. I think you should load the model from:

from transformers.trainer_utils import EvalPrediction, get_last_checkpoint

last_checkpoint = get_last_checkpoint(script_args.output_dir)
trainer.train(resume_from_checkpoint=last_checkpoint)

Hello, may you provide how to save the model? Can I directly use

trainer.save

github-actions bot closed this as completed Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to load the model and the checkpoint after trained the model? #674

How to load the model and the checkpoint after trained the model? #674

ccwdb commented Aug 22, 2023

lvwerra commented Aug 23, 2023

ccwdb commented Aug 25, 2023

younesbelkada commented Aug 28, 2023 •

edited

Loading

ccwdb commented Sep 1, 2023

github-actions bot commented Sep 25, 2023

louieworth commented Nov 27, 2023

How to load the model and the checkpoint after trained the model? #674

How to load the model and the checkpoint after trained the model? #674

Comments

ccwdb commented Aug 22, 2023

lvwerra commented Aug 23, 2023

ccwdb commented Aug 25, 2023

younesbelkada commented Aug 28, 2023 • edited Loading

ccwdb commented Sep 1, 2023

github-actions bot commented Sep 25, 2023

louieworth commented Nov 27, 2023

younesbelkada commented Aug 28, 2023 •

edited

Loading