Officially support naive PP for quantized models + PEFT #1523

younesbelkada · 2023-06-05T10:48:46Z

What does this PR do?

Naive Pipeline Parallelism should be supported by accelerate and should work, if we properly educate users on how to use it.

What is NPP?

It is the simplest paradigm for running a model across multiple GPUs. It tries to evenly fit the model across all available GPUs (e.g. device_map="auto")

When to use it and when not to use it?

Initially I added that check because I was afraid users will train 8bit models that are loaded across multiple GPUs and under multi-GPU distributed regime. In that case the model will be converted to DDP (which is fine if the model fits in a single GPU and duplicated across multiple GPUs (Data Parallelism)) - which can lead to many breaking behaviours such as huggingface/peft#269 (comment) .
The fix is to relax the check constraint and to also check if we are under multi GPU distributed regime (expects to use DDP).

In TRL library, it is possible to use the PPOTrainer (that calls accelerator.prepare under the hood) to apply Naive Pipeline Parallelism: https://huggingface.co/docs/trl/main/en/lora_tuning_peft#naive-pipeline-parallelism-npp-for-large-models-60b-models to train 60B+ scale models using RLHF. The error was never raised there because I forgot to store the attribute hf_device_map inside the model class we use in TRL.

To reproduce (you need PEFT and run this script in a multi-GPU env):

from accelerate import Accelerator
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training

model_id = "facebook/opt-350m"
accelerator = Accelerator()

config = LoraConfig(
    r=16, 
    lora_alpha=32, 
    target_modules=["q_proj", "v_proj"], 
    lora_dropout=0.05, 
    bias="none", 
    task_type="CAUSAL_LM"
)

model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", load_in_8bit=True)
model = prepare_model_for_int8_training(model)

print(set(model.hf_device_map.values()))

model = get_peft_model(model, config)

model = accelerator.prepare(model)

cc @sgugger @muellerzr

- relax check - add test

HuggingFaceDocBuilderDev · 2023-06-05T10:52:36Z

The documentation is not available anymore as the PR was closed or merged.

src/accelerate/accelerator.py

sgugger

Thanks!

src/accelerate/accelerator.py

muellerzr

Thanks! LG2M :)

robinsonmhj · 2023-12-06T05:22:27Z

this feature is only in main, is there any plan to put into a new release so that I can use pip to install?

officially support naive PP

474a22e

- relax check - add test

younesbelkada requested a review from sgugger June 5, 2023 10:48

younesbelkada mentioned this pull request Jun 5, 2023

ValueError: You can't train a model that has been loaded in 8-bit precision on multiple devices. #1515

Closed

younesbelkada requested a review from muellerzr June 5, 2023 10:53

younesbelkada commented Jun 5, 2023

View reviewed changes

src/accelerate/accelerator.py Outdated Show resolved Hide resolved

younesbelkada commented Jun 5, 2023

View reviewed changes

src/accelerate/accelerator.py Outdated Show resolved Hide resolved

younesbelkada commented Jun 5, 2023

View reviewed changes

src/accelerate/accelerator.py Outdated Show resolved Hide resolved

Apply suggestions from code review

1897b81

sgugger approved these changes Jun 5, 2023

View reviewed changes

src/accelerate/accelerator.py Outdated Show resolved Hide resolved

younesbelkada and others added 2 commits June 6, 2023 09:40

more tests

3b7ac4e

Update src/accelerate/accelerator.py

dcfe8e4

muellerzr approved these changes Jun 6, 2023

View reviewed changes

younesbelkada merged commit ef0c4bf into huggingface:main Jun 6, 2023

younesbelkada deleted the fix-multi-gpu branch June 6, 2023 12:42

yangjianxin1 mentioned this pull request Jun 25, 2023

多卡训练GPU使用有问题 yangjianxin1/Firefly#38

Closed

cassianlewis mentioned this pull request Aug 3, 2023

Quantized models + PEFT + multi-gpu setup failing during training huggingface/transformers#25289

Closed

3 tasks

younesbelkada mentioned this pull request Aug 17, 2023

finetuning with PEFT int-8bit + LoRA on single node multiGPU was working, now doesn't any more #1840

Closed

xinmengZ mentioned this pull request Oct 5, 2023

Error running 06_fine_tune_qlora.py on multi-gpu databricks/databricks-ml-examples#77

Open

tom-ph mentioned this pull request Feb 25, 2024

"You can't train a model that has been loaded with device_map='auto' in any distributed mode" error when running on multi-GPU VM #2486

Closed

4 tasks

JubilantJerry mentioned this pull request Jun 6, 2024

Cannot train quantized model with both model and data parallelism #2832

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Officially support naive PP for quantized models + PEFT #1523

Officially support naive PP for quantized models + PEFT #1523

younesbelkada commented Jun 5, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 5, 2023 •

edited

Loading

sgugger left a comment

muellerzr left a comment

robinsonmhj commented Dec 6, 2023 •

edited

Loading

Officially support naive PP for quantized models + PEFT #1523

Officially support naive PP for quantized models + PEFT #1523

Conversation

younesbelkada commented Jun 5, 2023 • edited Loading

What does this PR do?

What is NPP?

When to use it and when not to use it?

HuggingFaceDocBuilderDev commented Jun 5, 2023 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

muellerzr left a comment

Choose a reason for hiding this comment

robinsonmhj commented Dec 6, 2023 • edited Loading

younesbelkada commented Jun 5, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 5, 2023 •

edited

Loading

robinsonmhj commented Dec 6, 2023 •

edited

Loading