Invalid device string: 'float32' #4698

OnewayLab · 2024-07-06T10:01:34Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.8.3.dev0
Platform: Linux-4.19.91-014-kangaroo.2.10.13.5c249cdaf.x86_64-x86_64-with-glibc2.35
Python version: 3.10.14
PyTorch version: 2.3.0 (GPU)
Transformers version: 4.42.3
Datasets version: 2.18.0
Accelerate version: 0.31.0
PEFT version: 0.11.1
TRL version: 0.8.6
GPU type: NVIDIA A100-SXM4-80GB
DeepSpeed version: 0.14.4
Bitsandbytes version: 0.43.1
vLLM version: 0.5.0.post1

Reproduction

Command

llamafactory-cli train \
    --do_train \
    --stage sft \
    --finetuning_type full \
    --use_unsloth \
    --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
    --flash_attn fa2 \
    --template llama3 \
    --dataset $DATASET \
    --dataset_dir data \
    --cutoff_len 8192 \
    --preprocessing_num_workers 24 \
    --output_dir output/tmp \
    --overwrite_output_dir \
    --save_steps 100 \
    --logging_steps 10 \
    --plot_loss \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 32 \
    --per_device_eval_batch_size 4 \
    --learning_rate 1e-5 \
    --max_steps 1000 \
    --lr_scheduler_type cosine \
    --warmup_ratio 0.1 \
    --bf16

Error Log

[INFO|trainer.py:3478] 2024-07-06 17:38:36,193 >> Saving model checkpoint to output/tmp/checkpoint-5
[INFO|configuration_utils.py:472] 2024-07-06 17:38:36,207 >> Configuration saved in output/tmp/checkpoint-5/config.json
[INFO|configuration_utils.py:769] 2024-07-06 17:38:36,214 >> Configuration saved in output/tmp/checkpoint-5/generation_config.json
[INFO|modeling_utils.py:2698] 2024-07-06 17:40:06,590 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at output/tmp/checkpoint-5/model.safetensors.index.json.
Traceback (most recent call last):
  File "/opt/conda/envs/lf/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
  File "~/LLaMA-Factory/src/llamafactory/cli.py", line 111, in main
    run_exp()
  File "~/LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "~LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 90, in run_sft
    train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/transformers/trainer.py", line 1932, in train
    return inner_training_loop(
  File "<string>", line 367, in _fast_inner_training_loop
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/transformers/trainer.py", line 3307, in training_step
    loss = self.compute_loss(model, inputs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/transformers/trainer.py", line 3338, in compute_loss
    outputs = model(**inputs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/unsloth/models/llama.py", line 856, in _CausalLM_fast_forward
    outputs = self.model(
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/unsloth/models/llama.py", line 561, in LlamaModel_fast_forward
    inputs_embeds = inputs_embeds.to(self.config.torch_dtype)
RuntimeError: Invalid device string: 'float32'

Expected behavior

Successful training.

Others

The problem occurs in the first step after saving the first checkpoint. I guess it's because model.config.torch_dtype changes from torch.bfloat16 to a string float32 while saving the checkpoint.

The text was updated successfully, but these errors were encountered:

OnewayLab · 2024-07-06T10:23:13Z

Maybe it's an issue from Hugging Face Transformers. I found the following code in transformers/modeling_utils.py:

class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMixin, PeftAdapterMixin):
    def save_pretrained(...):
        ...
        # save the string version of dtype to the config, e.g. convert torch.float32 => "float32"
        # we currently don't use this setting automatically, but may start to use with v5
        dtype = get_parameter_dtype(model_to_save)
        model_to_save.config.torch_dtype = str(dtype).split(".")[1]
       ...

What could we do to fix it...

skiiwoo · 2024-07-30T13:37:03Z

Maybe it's an issue with unsloth.
I got same error when use_unsloth: true.

### model
model_name_or_path: Qwen/Qwen2-1.5B-Instruct
flash_attn: fa2
# use_unsloth: true

### method
stage: sft
do_train: true
finetuning_type: full
bf16: true

### dataset
dataset: mine
template: qwen
cutoff_len: 4000
overwrite_cache: true
preprocessing_num_workers: 8

### output
output_dir: Qwen2-1.5B-Instruct
logging_steps: 10
save_steps: 10
save_strategy: steps
plot_loss: true
overwrite_output_dir: true

per_device_train_batch_size: 8
gradient_accumulation_steps: 4
learning_rate: 1.0e-4
num_train_epochs: 1
lr_scheduler_type: cosine
ddp_timeout: 180000000

### eval
val_size: 0.1
per_device_eval_batch_size: 8
eval_strategy: steps
eval_steps: 30


### log
report_to: wandb
run_name: Qwen2-1.5B-Instruct

fix the bug #404 and the bug hiyouga/LLaMA-Factory#4698 (comment)

relic-yuexi · 2024-08-09T10:23:28Z

@hiyouga This has fixed in unsloth. And it will be useful in next unsloth version or user can fix it follow now: unslothai/unsloth#874 (comment)

OnewayLab · 2024-08-10T09:22:58Z

@hiyouga This has fixed in unsloth. And it will be useful in next unsloth version or user can fix it follow now: unslothai/unsloth#874 (comment)

Thanks for your reminding! I will close this issue.

github-actions bot added the pending This problem is yet to be addressed label Jul 6, 2024

relic-yuexi mentioned this issue Aug 5, 2024

fix: fix config.torch_dtype bug unslothai/unsloth#874

Merged

danielhanchen pushed a commit to unslothai/unsloth that referenced this issue Aug 5, 2024

fix: fix config.torch_dtype bug (#874)

16b6932

fix the bug #404 and the bug hiyouga/LLaMA-Factory#4698 (comment)

OnewayLab closed this as completed Aug 10, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid device string: 'float32' #4698

Invalid device string: 'float32' #4698

OnewayLab commented Jul 6, 2024

OnewayLab commented Jul 6, 2024

skiiwoo commented Jul 30, 2024

relic-yuexi commented Aug 9, 2024

OnewayLab commented Aug 10, 2024

Invalid device string: 'float32' #4698

Invalid device string: 'float32' #4698

Comments

OnewayLab commented Jul 6, 2024

Reminder

System Info

Reproduction

Command

Error Log

Expected behavior

Others

OnewayLab commented Jul 6, 2024

skiiwoo commented Jul 30, 2024

relic-yuexi commented Aug 9, 2024

OnewayLab commented Aug 10, 2024