Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid device string: 'float32' #4698

Closed
1 task done
OnewayLab opened this issue Jul 6, 2024 · 4 comments
Closed
1 task done

Invalid device string: 'float32' #4698

OnewayLab opened this issue Jul 6, 2024 · 4 comments
Labels
solved This problem has been already solved

Comments

@OnewayLab
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.8.3.dev0
  • Platform: Linux-4.19.91-014-kangaroo.2.10.13.5c249cdaf.x86_64-x86_64-with-glibc2.35
  • Python version: 3.10.14
  • PyTorch version: 2.3.0 (GPU)
  • Transformers version: 4.42.3
  • Datasets version: 2.18.0
  • Accelerate version: 0.31.0
  • PEFT version: 0.11.1
  • TRL version: 0.8.6
  • GPU type: NVIDIA A100-SXM4-80GB
  • DeepSpeed version: 0.14.4
  • Bitsandbytes version: 0.43.1
  • vLLM version: 0.5.0.post1

Reproduction

Command

llamafactory-cli train \
    --do_train \
    --stage sft \
    --finetuning_type full \
    --use_unsloth \
    --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
    --flash_attn fa2 \
    --template llama3 \
    --dataset $DATASET \
    --dataset_dir data \
    --cutoff_len 8192 \
    --preprocessing_num_workers 24 \
    --output_dir output/tmp \
    --overwrite_output_dir \
    --save_steps 100 \
    --logging_steps 10 \
    --plot_loss \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 32 \
    --per_device_eval_batch_size 4 \
    --learning_rate 1e-5 \
    --max_steps 1000 \
    --lr_scheduler_type cosine \
    --warmup_ratio 0.1 \
    --bf16

Error Log

[INFO|trainer.py:3478] 2024-07-06 17:38:36,193 >> Saving model checkpoint to output/tmp/checkpoint-5
[INFO|configuration_utils.py:472] 2024-07-06 17:38:36,207 >> Configuration saved in output/tmp/checkpoint-5/config.json
[INFO|configuration_utils.py:769] 2024-07-06 17:38:36,214 >> Configuration saved in output/tmp/checkpoint-5/generation_config.json
[INFO|modeling_utils.py:2698] 2024-07-06 17:40:06,590 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at output/tmp/checkpoint-5/model.safetensors.index.json.
Traceback (most recent call last):
  File "/opt/conda/envs/lf/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
  File "~/LLaMA-Factory/src/llamafactory/cli.py", line 111, in main
    run_exp()
  File "~/LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "~LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 90, in run_sft
    train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/transformers/trainer.py", line 1932, in train
    return inner_training_loop(
  File "<string>", line 367, in _fast_inner_training_loop
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/transformers/trainer.py", line 3307, in training_step
    loss = self.compute_loss(model, inputs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/transformers/trainer.py", line 3338, in compute_loss
    outputs = model(**inputs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/unsloth/models/llama.py", line 856, in _CausalLM_fast_forward
    outputs = self.model(
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/lf/lib/python3.10/site-packages/unsloth/models/llama.py", line 561, in LlamaModel_fast_forward
    inputs_embeds = inputs_embeds.to(self.config.torch_dtype)
RuntimeError: Invalid device string: 'float32'

Expected behavior

Successful training.

Others

The problem occurs in the first step after saving the first checkpoint. I guess it's because model.config.torch_dtype changes from torch.bfloat16 to a string float32 while saving the checkpoint.

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jul 6, 2024
@OnewayLab
Copy link
Author

Maybe it's an issue from Hugging Face Transformers. I found the following code in transformers/modeling_utils.py:

class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMixin, PeftAdapterMixin):
    def save_pretrained(...):
        ...
        # save the string version of dtype to the config, e.g. convert torch.float32 => "float32"
        # we currently don't use this setting automatically, but may start to use with v5
        dtype = get_parameter_dtype(model_to_save)
        model_to_save.config.torch_dtype = str(dtype).split(".")[1]
       ...

What could we do to fix it...

@skiiwoo
Copy link

skiiwoo commented Jul 30, 2024

Maybe it's an issue with unsloth.
I got same error when use_unsloth: true.

### model
model_name_or_path: Qwen/Qwen2-1.5B-Instruct
flash_attn: fa2
# use_unsloth: true

### method
stage: sft
do_train: true
finetuning_type: full
bf16: true

### dataset
dataset: mine
template: qwen
cutoff_len: 4000
overwrite_cache: true
preprocessing_num_workers: 8

### output
output_dir: Qwen2-1.5B-Instruct
logging_steps: 10
save_steps: 10
save_strategy: steps
plot_loss: true
overwrite_output_dir: true

per_device_train_batch_size: 8
gradient_accumulation_steps: 4
learning_rate: 1.0e-4
num_train_epochs: 1
lr_scheduler_type: cosine
ddp_timeout: 180000000

### eval
val_size: 0.1
per_device_eval_batch_size: 8
eval_strategy: steps
eval_steps: 30


### log
report_to: wandb
run_name: Qwen2-1.5B-Instruct

@relic-yuexi
Copy link
Contributor

@hiyouga This has fixed in unsloth. And it will be useful in next unsloth version or user can fix it follow now: unslothai/unsloth#874 (comment)

@OnewayLab
Copy link
Author

@hiyouga This has fixed in unsloth. And it will be useful in next unsloth version or user can fix it follow now: unslothai/unsloth#874 (comment)

Thanks for your reminding! I will close this issue.

@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

4 participants