[BUG] Unexpected GPU memory consumption when using transformers PEFT in Deepspeed Zero3 #2454

alekseymalakhov11 · 2024-02-15T19:27:27Z

System Info

transformers = "4.35.0"
peft = "0.7.1"
torch = ">=2.0.0"
accelerate = "^0.24.1"
deepspeed = "^0.9.5"

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

Description

Llama30B with Lora adapters cannot fit into 8 x A100 (80GB).

Demonstration of Problem and Experiment Setups

I will illustrate this issue using various experiment setups on smaller models:

7b+lora+stage 3
7b+stage 3
7b+lora+stage 2
7b + stage 2

All other parameters remain consistent in the experiments below.

Expected behavior

Suspected Cause

The possible reason for this issue might be that Zero3 does not partition non-trainable weights across GPUs. The basis for this assumption is:

The memory consumption is consistent with predicted values when Lora is not used.
When training the model with both Zero2 and Zero3 using Lora, I observe nearly the same memory consumption.
A code examination of the Zero Runtime sources also suggests this could be the case.

Expected behavior

Training the model with Zero3 while using Lora should consume significantly less memory than Zero2 with Lora.

We also opened an issue in Deepspeed, but no one has assisted us. Additionally, you might have more experience with PEFT and Deepspeed integration in the Transformers trainer.

The text was updated successfully, but these errors were encountered:

BenjaminBossan · 2024-02-16T10:39:16Z

Note that the same issue has been opened on PEFT. We should keep discussion in one thread to avoid duplicate work.

muellerzr · 2024-02-16T13:08:22Z

And in transformers as well (in the future, please do not do this and just open in one. All of these were opened within minutes of each other).

muellerzr · 2024-02-16T13:08:48Z

As younes has already started responding there, let's keep it in the transformers one please

github-actions · 2024-03-17T15:07:04Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

muellerzr closed this as completed Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Unexpected GPU memory consumption when using transformers PEFT in Deepspeed Zero3 #2454

[BUG] Unexpected GPU memory consumption when using transformers PEFT in Deepspeed Zero3 #2454

alekseymalakhov11 commented Feb 15, 2024

BenjaminBossan commented Feb 16, 2024

muellerzr commented Feb 16, 2024

muellerzr commented Feb 16, 2024

github-actions bot commented Mar 17, 2024

[BUG] Unexpected GPU memory consumption when using transformers PEFT in Deepspeed Zero3 #2454

[BUG] Unexpected GPU memory consumption when using transformers PEFT in Deepspeed Zero3 #2454

Comments

alekseymalakhov11 commented Feb 15, 2024

System Info

Information

Tasks

Reproduction

Description

Demonstration of Problem and Experiment Setups

Expected behavior

Suspected Cause

Expected behavior

BenjaminBossan commented Feb 16, 2024

muellerzr commented Feb 16, 2024

muellerzr commented Feb 16, 2024

github-actions bot commented Mar 17, 2024