-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid device string: 'float32' #4698
Comments
Maybe it's an issue from Hugging Face Transformers. I found the following code in class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMixin, PeftAdapterMixin):
def save_pretrained(...):
...
# save the string version of dtype to the config, e.g. convert torch.float32 => "float32"
# we currently don't use this setting automatically, but may start to use with v5
dtype = get_parameter_dtype(model_to_save)
model_to_save.config.torch_dtype = str(dtype).split(".")[1]
... What could we do to fix it... |
Maybe it's an issue with unsloth. ### model
model_name_or_path: Qwen/Qwen2-1.5B-Instruct
flash_attn: fa2
# use_unsloth: true
### method
stage: sft
do_train: true
finetuning_type: full
bf16: true
### dataset
dataset: mine
template: qwen
cutoff_len: 4000
overwrite_cache: true
preprocessing_num_workers: 8
### output
output_dir: Qwen2-1.5B-Instruct
logging_steps: 10
save_steps: 10
save_strategy: steps
plot_loss: true
overwrite_output_dir: true
per_device_train_batch_size: 8
gradient_accumulation_steps: 4
learning_rate: 1.0e-4
num_train_epochs: 1
lr_scheduler_type: cosine
ddp_timeout: 180000000
### eval
val_size: 0.1
per_device_eval_batch_size: 8
eval_strategy: steps
eval_steps: 30
### log
report_to: wandb
run_name: Qwen2-1.5B-Instruct |
fix the bug #404 and the bug hiyouga/LLaMA-Factory#4698 (comment)
@hiyouga This has fixed in unsloth. And it will be useful in next unsloth version or user can fix it follow now: unslothai/unsloth#874 (comment) |
Thanks for your reminding! I will close this issue. |
Reminder
System Info
llamafactory
version: 0.8.3.dev0Reproduction
Command
llamafactory-cli train \ --do_train \ --stage sft \ --finetuning_type full \ --use_unsloth \ --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \ --flash_attn fa2 \ --template llama3 \ --dataset $DATASET \ --dataset_dir data \ --cutoff_len 8192 \ --preprocessing_num_workers 24 \ --output_dir output/tmp \ --overwrite_output_dir \ --save_steps 100 \ --logging_steps 10 \ --plot_loss \ --per_device_train_batch_size 4 \ --gradient_accumulation_steps 32 \ --per_device_eval_batch_size 4 \ --learning_rate 1e-5 \ --max_steps 1000 \ --lr_scheduler_type cosine \ --warmup_ratio 0.1 \ --bf16
Error Log
Expected behavior
Successful training.
Others
The problem occurs in the first step after saving the first checkpoint. I guess it's because
model.config.torch_dtype
changes fromtorch.bfloat16
to a stringfloat32
while saving the checkpoint.The text was updated successfully, but these errors were encountered: