[🐯+GRPO] Support FSDP + Fix bug when using LigerGRPO with DDP #3260

shivam15s · 2025-04-08T02:17:53Z

What does this PR do?

This PR aims to do two things:

The recent integration of LigerGRPO had a bug: when using DDP and performing a forward pass through a submodule of the unwrapped model, the necessary hooks weren't registered correctly. This caused the model weights across GPUs to fall out of sync. To fix this, the PR introduces a Forward Redirection mechanism—a workaround that ensures hooks are properly registered (compatible with both DDP and FSDP) and enables the custom forward pass required by Liger.
Add support for FSDP to GRPO Trainer. We leverage summon_full_params to make model.generate work with FSDP.

Experiment Script: https://gist.github.com/shivam15s/08a9bccd0d72dd0d29bdb912cb9885be

DDP: Liger (blue) v Non-liger (black)

FSDP: Liger (green) v Non-liger (Purple)

Known Limitations with FSDP (can add support in subsequent PR(s))

sync_ref_model not supported currently
create_reference_model not supported currently

Benchmarking:
Dist Strategy: DDP
7 policy workers, 1 vllm worker (8 h100)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

kashif · 2025-04-11T09:44:38Z

testing using:

import torch
from datasets import load_dataset
from trl import GRPOConfig, GRPOTrainer
import torch.distributed as dist
from torch.profiler import profile, record_function, ProfilerActivity
from transformers import TrainerCallback
import os
# from torch.distributed.fsdp import FSDPConfig, AutoWrapPolicy
# dataset = load_dataset("trl-internal-testing/zen", "standard_prompt_only", split="train")
dataset = load_dataset("trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness", split="train")
# only keep the prompt column
dataset = dataset.map(lambda x: {"prompt": x["prompt"]}, remove_columns=dataset.column_names)

training_args = GRPOConfig(
    output_dir="./scratch_dir",
    learning_rate=0.001,  # increase the learning rate to speed up the test
    per_device_train_batch_size=3,  # reduce the batch size to reduce memory usage
    num_generations=3,  # reduce the number of generations to reduce memory usage
    report_to=["tensorboard"],
    max_completion_length=256,  # reduce the completion length to reduce memory usage
    logging_steps=1,
    save_strategy="no",
    max_steps=50,
    use_liger_loss=True,
)
trainer = GRPOTrainer(
    model="trl-internal-testing/tiny-Qwen2ForCausalLM-2.5",
    reward_funcs="trl-internal-testing/tiny-Qwen2ForSequenceClassification-2.5",
    args=training_args,
    train_dataset=dataset,
)

class ProfCallback(TrainerCallback):
    def __init__(self, prof):
        self.prof = prof

    def on_step_end(self, args, state, control, **kwargs):
        self.prof.step()

# Create directory for profiling outputs
os.makedirs("profiling_results", exist_ok=True)

# Define profiling context manager
def train_with_profiling(enable_profiling=True):
    if enable_profiling:
        with profile(
            activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
            record_shapes=True,
            profile_memory=True,
            with_stack=True,
            with_flops=True,
            on_trace_ready=torch.profiler.tensorboard_trace_handler("profiling_results") if trainer.accelerator.is_main_process else None,
            schedule=torch.profiler.schedule(
                wait=1,
                warmup=1,
                active=2,
                repeat=1),
        ) as prof:
            trainer.add_callback(ProfCallback(prof))
            trainer.train()
        # Print profiling results summary
        # print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=20))
    else:
        trainer.train()

# trainer.train()
train_with_profiling(enable_profiling=False)

# destroy process group
if dist.is_initialized():
    dist.destroy_process_group()

HuggingFaceDocBuilderDev · 2025-04-12T08:33:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

LeonEricsson · 2025-04-30T07:16:10Z

trl/trainer/grpo_trainer.py

@@ -407,7 +410,7 @@ def __init__(
        if self.beta == 0.0:
            # If beta is 0.0, the reference model is not needed
            self.ref_model = None
-        elif is_deepspeed_zero3_enabled():
+        elif is_deepspeed_zero3_enabled() or args.fsdp_config is not None:


args.fsdp_config defaults to

{'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}

when fsdp is not enabled. Probably want args.fsdp instead?

ran with

from datasets import load_dataset from trl import GRPOConfig, GRPOTrainer def main(): # Load dataset train_dataset = load_dataset("trl-lib/tldr", split="train[:128]") def reward_len(completions, **kwargs): return [-abs(20 - len(completion)) for completion in completions] # Train model training_args = GRPOConfig( output_dir=f"./output", logging_steps=10, bf16=True, max_prompt_length=250, max_completion_length=250, per_device_train_batch_size=2, per_device_eval_batch_size=2, num_generations=2, num_train_epochs=1, do_eval=True, optim="paged_adamw_8bit", max_steps=10, report_to="none", ) print(training_args.fsdp_config) trainer = GRPOTrainer( args=training_args, model="Qwen/Qwen2.5-0.5B-Instruct", train_dataset=train_dataset, eval_dataset=train_dataset, reward_funcs=reward_len, ) trainer.train() if __name__ == "__main__": main()

lewtun

Thanks for the nice PR @shivam15s ! LGTM with a small change on how we determine if FSP is enable for the ref model

trl/trainer/grpo_trainer.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

kashif · 2025-04-30T11:02:28Z

@LeonEricsson I have integrated the commit from @jglaser here

trl/trainer/grpo_trainer.py

thepowerfuldeez · 2025-06-10T08:49:00Z

Are there plans to support FSDP2?

Ubuntu and others added 20 commits April 8, 2025 01:57

add liger GRPO Loss

db19e3b

use ref_per_token_logps as input and make liger compute same metrics

e77bd4c

add grpo slow test with liger

4b38b76

precommit

bb1c3d5

rename config to use_liger_loss

2146fec

minor refactor and fix bug in rebasing

cbe6efd

move to Parameters that control the training

46663e7

split compute_loss to call helper

46c8c68

remove num_items_in_batch

1371078

refactor to mention last hidden state

7d6b394

Update test_grpo_slow.py

31ab5ab

add fwd_redirection to support liger+ddp and fsdp

c28a5b1

fix things

6b1d138

fix rebase bug

78e2517

fix rebase bug

41c4169

fix rebase bug

546f75d

checkstyle

e2fa63c

bug fix in rebase

f8c3ed1

add comment

b5fd692

change model generate to use summon full params

d8d5130

kashif self-assigned this Apr 8, 2025

kashif and others added 4 commits April 8, 2025 15:22

Merge branch 'main' into shisahni/fsdp_ddp_liger

47b221d

:Merge remote-tracking branch 'origin/main' into shisahni/fsdp_ddp_liger

ad06d68

Merge branch 'main' into shisahni/fsdp_ddp_liger

dbf84cb

isort

20a1111

kashif marked this pull request as ready for review April 11, 2025 20:20

shivam15s added 2 commits April 11, 2025 21:41

support different loss types in liger

3de4d73

update liger version

7e26a18

shivam15s and others added 6 commits April 22, 2025 17:59

add partial recurse

ec06891

Merge branch 'main' into shisahni/fsdp_ddp_liger

b3a7637

Merge branch 'main' into shisahni/fsdp_ddp_liger

0bb331d

move _ForwardRedirection to the models/utils

258791c

fix typo

96332e5

use_ref_model is false when beta=0

dd9bf82

kashif approved these changes Apr 29, 2025

View reviewed changes

kashif added 2 commits April 29, 2025 14:55

Merge branch 'main' into shisahni/fsdp_ddp_liger

1320d6e

formatting

3897703

kashif approved these changes Apr 29, 2025

View reviewed changes

kashif mentioned this pull request Apr 29, 2025

Support FSDP in GRPO trainer #3354

Closed

4 tasks

Merge remote-tracking branch 'origin/main' into shisahni/fsdp_ddp_liger

2074201

LeonEricsson mentioned this pull request Apr 30, 2025

GRPO trainer is not compatible with FSDP #3348

Closed

5 tasks

LeonEricsson reviewed Apr 30, 2025

View reviewed changes

lewtun approved these changes Apr 30, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

kashif and others added 2 commits April 30, 2025 11:23

Update trl/trainer/grpo_trainer.py

5569d79

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

more memory efficient version from @jglaser

8d2910a

kashif and others added 5 commits April 30, 2025 14:09

define is_fsdp_enabled

d20a8c8

undo

9fbf125

super hasnt been called

c9f61c6

remove ref last hidden state handling for liger for now

f231ef9

formatting

2463a41

kashif merged commit 09b669f into huggingface:main Apr 30, 2025
9 checks passed

LeonEricsson mentioned this pull request May 1, 2025

[GRPO] Reference model initialization bug fix #3397

Merged

5 tasks

nph4rd reviewed May 3, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Show resolved Hide resolved

LeonEricsson mentioned this pull request May 9, 2025

Support FSDP #3259

Closed

zkpranav mentioned this pull request Jun 16, 2025

♻️ Avoids redundant calculation of ref logps in the new policy update loop #3600

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[🐯+GRPO] Support FSDP + Fix bug when using LigerGRPO with DDP #3260

[🐯+GRPO] Support FSDP + Fix bug when using LigerGRPO with DDP #3260

Uh oh!

shivam15s commented Apr 8, 2025 •

edited

Loading

Uh oh!

kashif commented Apr 11, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 12, 2025

Uh oh!

LeonEricsson Apr 30, 2025

Uh oh!

kashif Apr 30, 2025

Uh oh!

LeonEricsson Apr 30, 2025 •

edited

Loading

Uh oh!

lewtun left a comment

Uh oh!

Uh oh!

kashif commented Apr 30, 2025

Uh oh!

Uh oh!

Uh oh!

thepowerfuldeez commented Jun 10, 2025

Uh oh!

Uh oh!

[🐯+GRPO] Support FSDP + Fix bug when using LigerGRPO with DDP #3260

[🐯+GRPO] Support FSDP + Fix bug when using LigerGRPO with DDP #3260

Uh oh!

Conversation

shivam15s commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

kashif commented Apr 11, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 12, 2025

Uh oh!

LeonEricsson Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

kashif Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

LeonEricsson Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kashif commented Apr 30, 2025

Uh oh!

Uh oh!

Uh oh!

thepowerfuldeez commented Jun 10, 2025

Uh oh!

Uh oh!

shivam15s commented Apr 8, 2025 •

edited

Loading

LeonEricsson Apr 30, 2025 •

edited

Loading