Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoRA Fine Tuning Crash at FlashAttention Issue #1828

Closed
pasindumuth opened this issue Jun 30, 2023 · 8 comments · Fixed by #2126
Closed

LoRA Fine Tuning Crash at FlashAttention Issue #1828

pasindumuth opened this issue Jun 30, 2023 · 8 comments · Fixed by #2126

Comments

@pasindumuth
Copy link

pasindumuth commented Jun 30, 2023

I tried running this command:

deepspeed fastchat/train/train_lora.py \
    --model_name_or_path huggyllama/llama-7b  \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --data_path data/dummy_conversation.json \
    --bf16 True \
    --output_dir ../output_vicuna-lora \
    --num_train_epochs 3 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1200 \
    --save_total_limit 100 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --q_lora True \
    --deepspeed playground/deepspeed_config_s2.json

On runpod.io, but I keep getting:

ime to load utils op: 0.00040268898010253906 seconds
  0%|                                                                                                                                                                                            | 0/573 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/tmp/FastChat/fastchat/train/train_lora.py", line 151, in <module>
    train()
  File "/tmp/FastChat/fastchat/train/train_lora.py", line 141, in train
    trainer.train()
  File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1929, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2699, in training_step
    loss = self.compute_loss(model, inputs)
  File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2731, in compute_loss
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1675, in forward
    loss = self.module(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/peft/peft_model.py", line 678, in forward
    return self.base_model(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/tmp/FastChat/fastchat/train/llama_flash_attn_monkey_patch.py", line 88, in forward
    output_unpad = flash_attn_unpadded_qkvpacked_func(
  File "/usr/local/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 256, in flash_attn_unpadded_qkvpacked_func
    return FlashAttnQKVPackedFunc.apply(qkv, cu_seqlens, max_seqlen, dropout_p, softmax_scale,
  File "/usr/local/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 58, in forward
    out, softmax_lse, rng_state, S_dmask = _flash_attn_forward(
  File "/usr/local/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 21, in _flash_attn_forward
    softmax_lse, rng_state, *rest = flash_attn_cuda.fwd(
RuntimeError: Expected q_dtype == torch::kFloat16 || ((is_smI tried running this command:

My setup instructions were as follows:

  1. Create an A100 instance on runpod.io
  2. Do this:
# Checkout to specific commit
cd FastChat
git co c0523e0

# install packages
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

# Install packages that weren't included in requirements.txt for some reason...
pip install einops torch
pip install flash_attn==1.0.5

# For LoRA
pip install deepspeed
# Per https://github.com/artidoro/qlora/issues/154#issuecomment-1593665478 
pip install git+https://github.com/huggingface/peft
  1. Run the command at the start of this issue. Please note the use of huggyllama/llama-7b... is this the right HF model to use?
  2. Observe the problem.

Are there any plans to fix this?

@merrymercy
Copy link
Member

cc @BabyChouSr

@alphanlp
Copy link

alphanlp commented Jul 1, 2023

me too

@mvuthegoat
Copy link
Contributor

flash attention might not work for your gpu. Head dim > 64 backward requires A100 or H100. You can consider commenting out the flash attention part

@pasindumuth
Copy link
Author

I tried using an A100 as well and I got the same error.

@adazhng
Copy link

adazhng commented Jul 3, 2023

Same error observed. I found it could train by removing --q_lora True but the memory is up to 79GB per gpu (I current run on 8 * A100 with 80GB memory, the model i use is llama-33b). Looks like lora does not works in this case. Does it related witht the version of package?

@pauliZhang
Copy link

me too,you can remove flash-attn in train_lora.py,
just as :
image

@Trangle
Copy link
Contributor

Trangle commented Jul 5, 2023

may try using xformer.

@tmm1
Copy link
Contributor

tmm1 commented Aug 1, 2023

Try #2126

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants