LoRA Fine Tuning Crash at FlashAttention Issue #1828

pasindumuth · 2023-06-30T17:32:42Z

I tried running this command:

deepspeed fastchat/train/train_lora.py \
    --model_name_or_path huggyllama/llama-7b  \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --data_path data/dummy_conversation.json \
    --bf16 True \
    --output_dir ../output_vicuna-lora \
    --num_train_epochs 3 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1200 \
    --save_total_limit 100 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --q_lora True \
    --deepspeed playground/deepspeed_config_s2.json

On runpod.io, but I keep getting:

ime to load utils op: 0.00040268898010253906 seconds
  0%|                                                                                                                                                                                            | 0/573 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/tmp/FastChat/fastchat/train/train_lora.py", line 151, in <module>
    train()
  File "/tmp/FastChat/fastchat/train/train_lora.py", line 141, in train
    trainer.train()
  File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1929, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2699, in training_step
    loss = self.compute_loss(model, inputs)
  File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2731, in compute_loss
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1675, in forward
    loss = self.module(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/peft/peft_model.py", line 678, in forward
    return self.base_model(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/tmp/FastChat/fastchat/train/llama_flash_attn_monkey_patch.py", line 88, in forward
    output_unpad = flash_attn_unpadded_qkvpacked_func(
  File "/usr/local/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 256, in flash_attn_unpadded_qkvpacked_func
    return FlashAttnQKVPackedFunc.apply(qkv, cu_seqlens, max_seqlen, dropout_p, softmax_scale,
  File "/usr/local/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/usr/local/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 58, in forward
    out, softmax_lse, rng_state, S_dmask = _flash_attn_forward(
  File "/usr/local/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 21, in _flash_attn_forward
    softmax_lse, rng_state, *rest = flash_attn_cuda.fwd(
RuntimeError: Expected q_dtype == torch::kFloat16 || ((is_smI tried running this command:

My setup instructions were as follows:

Create an A100 instance on runpod.io
Do this:

# Checkout to specific commit
cd FastChat
git co c0523e0

# install packages
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

# Install packages that weren't included in requirements.txt for some reason...
pip install einops torch
pip install flash_attn==1.0.5

# For LoRA
pip install deepspeed
# Per https://github.com/artidoro/qlora/issues/154#issuecomment-1593665478 
pip install git+https://github.com/huggingface/peft

Run the command at the start of this issue. Please note the use of huggyllama/llama-7b... is this the right HF model to use?
Observe the problem.

Are there any plans to fix this?

The text was updated successfully, but these errors were encountered:

merrymercy · 2023-07-01T13:47:52Z

cc @BabyChouSr

alphanlp · 2023-07-01T15:59:40Z

me too

mvuthegoat · 2023-07-02T03:21:30Z

flash attention might not work for your gpu. Head dim > 64 backward requires A100 or H100. You can consider commenting out the flash attention part

pasindumuth · 2023-07-02T03:26:55Z

I tried using an A100 as well and I got the same error.

adazhng · 2023-07-03T02:12:08Z

Same error observed. I found it could train by removing --q_lora True but the memory is up to 79GB per gpu (I current run on 8 * A100 with 80GB memory, the model i use is llama-33b). Looks like lora does not works in this case. Does it related witht the version of package?

pauliZhang · 2023-07-05T08:47:21Z

me too,you can remove flash-attn in train_lora.py,
just as :

Trangle · 2023-07-05T16:08:15Z

may try using xformer.

tmm1 · 2023-08-01T05:30:52Z

Try #2126

This was referenced Jul 8, 2023

LoRA fine tuning without flash attention #1899

Closed

An option for LoRA fine tuning without flash attention #1901

Closed

Make flash_attn an option for train_lora #1927

Merged

tmm1 mentioned this issue Aug 1, 2023

Fixes for FlashAttention #2126

Merged

3 tasks

merrymercy closed this as completed in #2126 Aug 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LoRA Fine Tuning Crash at FlashAttention Issue #1828

LoRA Fine Tuning Crash at FlashAttention Issue #1828

pasindumuth commented Jun 30, 2023 •

edited

Loading

merrymercy commented Jul 1, 2023

alphanlp commented Jul 1, 2023

mvuthegoat commented Jul 2, 2023

pasindumuth commented Jul 2, 2023

adazhng commented Jul 3, 2023 •

edited

Loading

pauliZhang commented Jul 5, 2023

Trangle commented Jul 5, 2023

tmm1 commented Aug 1, 2023

LoRA Fine Tuning Crash at FlashAttention Issue #1828

LoRA Fine Tuning Crash at FlashAttention Issue #1828

Comments

pasindumuth commented Jun 30, 2023 • edited Loading

merrymercy commented Jul 1, 2023

alphanlp commented Jul 1, 2023

mvuthegoat commented Jul 2, 2023

pasindumuth commented Jul 2, 2023

adazhng commented Jul 3, 2023 • edited Loading

pauliZhang commented Jul 5, 2023

Trangle commented Jul 5, 2023

tmm1 commented Aug 1, 2023

pasindumuth commented Jun 30, 2023 •

edited

Loading

adazhng commented Jul 3, 2023 •

edited

Loading