Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to load the model and the checkpoint after trained the model? #674

Closed
ccwdb opened this issue Aug 22, 2023 · 6 comments
Closed

How to load the model and the checkpoint after trained the model? #674

ccwdb opened this issue Aug 22, 2023 · 6 comments

Comments

@ccwdb
Copy link

ccwdb commented Aug 22, 2023

I trained my model using the code in the sft_trainer.py. And I save the checkpoint and the model in the same dir.
But I don't know how to load the model with the checkpoint. Or I just want to konw that trainer.save_model(script_args.output_dir) means I have save a trained model, not just a checkpoint?
I try many ways to load the trained model but errors like

RuntimeError: Error(s) in loading state_dict for PrefixEncoder:
	Missing key(s) in state_dict: "embedding.weight". 

So, how to load the model???

@lvwerra
Copy link
Member

lvwerra commented Aug 23, 2023

Are you using PEFT for fine-tuning? And can you the code you are using to load the model?

@ccwdb
Copy link
Author

ccwdb commented Aug 25, 2023

Are you using PEFT for fine-tuning? And can you the code you are using to load the model?

Following are my code.

from datasets import load_dataset
from trl import SFTTrainer
from transformers import AutoModel, DataCollatorForLanguageModeling, AutoTokenizer, TrainingArguments, AutoModelForCausalLM
from peft import LoraConfig

# 加载模型和tokenizer
MODEL_PATH = "/home/qiji/chatglm2-6b"
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, trust_remote_code=True).half().cuda()
# model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
# tokenizer.padding_side = 'right'
# 设置微调参数
training_arguments = TrainingArguments(
    output_dir='/home/qiji/Container/jinkundong/SFT/results',
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    save_steps=5000,
    logging_steps=1000,
    learning_rate=2e-4,
    fp16=True,
    max_grad_norm=0.3,
    max_steps=5000,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type='constant',
)
model.config.use_cache = False

peft_config = LoraConfig(
    r=64,
    lora_alpha=16,
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM",
)

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,
)
no_deprecation_warning=True
dataset = load_dataset("/home/qiji/Container/jinkundong/SFT/SFT_dataset", split="train")
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    dataset_text_field="input",
    max_seq_length=512,
    peft_config=peft_config,
    args=training_arguments,
    data_collator=data_collator,
    packing=False,
)

trainer.train()

model_save_path = "/home/qiji/Container/jinkundong/SFT_2"
trainer.save_model(model_save_path)

As I konw, output_dir is the path that save the checkpoint and output_dir save the model. But I don't know what they can do .

@younesbelkada
Copy link
Contributor

younesbelkada commented Aug 28, 2023

hi @ccwdb
Can you try to run (after training)

import torch
from peft import AutoPeftModelForCausalLM

model = AutoPeftModelForCausalLM.from_pretrained(output_dir, torch_dtype=torch.float16)

Make sure to install peft>=0.4.0

@ccwdb
Copy link
Author

ccwdb commented Sep 1, 2023

hi @ccwdb Can you try to run (after training)

import torch
from peft import AutoPeftModelForCausalLM

model = AutoPeftModelForCausalLM.from_pretrained(output_dir, torch_dtype=torch.float16)

Make sure to install peft>=0.4.0

I tried your code, but it says that I didn't have a file named config.json.
Here is my file.
image
I don't konw the difference between pytorch_model,bin and adapter_model.bin

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@github-actions github-actions bot closed this as completed Nov 2, 2023
@louieworth
Copy link

After checking the source code. I think you should load the model from:

from transformers.trainer_utils import EvalPrediction, get_last_checkpoint

last_checkpoint = get_last_checkpoint(script_args.output_dir)
trainer.train(resume_from_checkpoint=last_checkpoint)

Hello, may you provide how to save the model? Can I directly use

trainer.save

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants