Usage of LoadBestPeftModelCallback in Finetuning stage

Hi friends,

I was trying to test the finetune/finetune.py script. It seems that state.best_model_checkpoint always return None leading to a failure at the end of the program. Is it that the program did not save a "best model" during training? I am a bit new to this, could anyone give some explanation on this and offer some hints on solving it? Thanks a lot!

command(single GPU):

`python finetune/finetune.py   --model_path="../../models/starcoder/"  --dataset_name="../../datasets/ArmelR/stack-exchange-instruction"  --subset="data/finetune"  --split="train"  --size_valid_set 10  --streaming  --seq_length 2048  --max_steps 2  --batch_size 1  --input_column_name="question"  --output_column_name="response"  --gradient_accumulation_steps 1  --learning_rate 1e-4  --lr_scheduler_type="cosine"  --num_warmup_steps 1  --weight_decay 0.05  --output_dir="./checkpoints"`

error image(single GPU):
<img width="1334" alt="企业微信截图_deeb82a7-668e-459c-882d-a72ca8ebacf4" src="https://github.com/bigcode-project/starcoder/assets/134281736/848bf724-ddf5-4c02-9eaa-35ca839c31d6">

command(mulit GPUs):
python -m torch.distributed.launch --nproc_per_node 4 finetune/finetune.py   --model_path="../../models/starcoder/"  --dataset_name="../../datasets/ArmelR/stack-exchange-instruction"  --subset="data/finetune"  --split="train"  --size_valid_set 10000  --streaming  --seq_length 2048  --max_steps 2  --batch_size 1  --input_column_name="question"  --output_column_name="response"  --gradient_accumulation_steps 16  --learning_rate 1e-4  --lr_scheduler_type="cosine"  --num_warmup_steps 100  --weight_decay 0.05  --output_dir="./checkpoints"

error image(mulit GPU):
<img width="1409" alt="image" src="https://github.com/bigcode-project/starcoder/assets/134281736/3d4b2d67-3394-4121-8ee2-b5a1c49911f7">



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Usage of LoadBestPeftModelCallback in Finetuning stage #136

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Usage of LoadBestPeftModelCallback in Finetuning stage #136

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions