[FSDP] Support lr scheduler by ChangyiYang · Pull Request #1040 · THUDM/slime

ChangyiYang · 2025-12-05T20:26:10Z

PR Description

[FSDP] Support LR scheduler

Implement FSDPLRScheduler for FSDP backend to align with Megatron-LM's learning rate scheduling behavior.

Key Changes

Add FSDPLRScheduler class inheriting from PyTorch LRScheduler
Support warmup, decay, and weight decay scheduling (linear, cosine, etc.)
Update actor logging and checkpoint functionality

Related issue #962

Reference

Based on Megatron-LM OptimizerParamScheduler

This ensures consistent LR scheduling behavior between FSDP and Megatron backends.

Example

Load and Save Test

- Modify FSDPLRScheduler.get_lr() to accept param_group parameter like Megatron - Update FSDP actor logging to use lr_scheduler.get_lr(group) instead of group['lr'] - Ensure consistent LR logging behavior between FSDP and Megatron backends

- Update step() method to call get_lr(param_group) for each param_group individually - Allow different param_groups to have different learning rates based on their max_lr/min_lr settings - Previously all param_groups got the same LR regardless of their individual settings

- Fix lr_scheduler.step() call in actor.py to use standard PyTorch interface - Ensure proper inheritance from torch.optim.lr_scheduler.LRScheduler - Verify get_lr() returns list[float] as required by PyTorch standard - Confirm state_dict/load_state_dict compatibility

ChangyiYang · 2025-12-06T00:18:44Z

@PopSoda2002 Hi! I think this pr is ready for review

slime/backends/fsdp_utils/lr_scheduler.py

Hecate0821 · 2025-12-06T05:29:36Z

LGTM

Co-authored-by: andy <andy271828@163.com>

Co-authored-by: AI Assistant <ai@example.com> Co-authored-by: andy <andy271828@163.com>

AI Assistant and others added 8 commits December 5, 2025 19:57

fix init lr

5c7a128

Merge branch 'THUDM:main' into main

5e0c18b

argument praser & pre commit

0a7b467

add overide and remove unnecessary rollout nums

c7d3115

correct metric collection

201095b

Hecate0821 reviewed Dec 6, 2025

View reviewed changes

slime/backends/fsdp_utils/lr_scheduler.py Outdated Show resolved Hide resolved

remove unnecessary state dict function

aa5799e

AI Assistant and others added 3 commits December 6, 2025 15:57

chore: credit andy271828@163.com

3f04b7c

chore: credit andy271828@163.com

b7d1653

add credit

41099c7

Co-authored-by: andy <andy271828@163.com>

zhuzilin approved these changes Dec 7, 2025

View reviewed changes

zhuzilin merged commit 41b5932 into THUDM:main Dec 7, 2025

This was referenced Dec 7, 2025

[FSDP] Support scheduled learning rate #962

Closed

Feat: add usage docs for fsdp #1092

Merged

Fengzdadi pushed a commit to Fengzdadi/slime that referenced this pull request Dec 19, 2025

[FSDP] Support lr scheduler (THUDM#1040)

bb836a6

Co-authored-by: AI Assistant <ai@example.com> Co-authored-by: andy <andy271828@163.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP] Support lr scheduler#1040

[FSDP] Support lr scheduler#1040
zhuzilin merged 12 commits intoTHUDM:mainfrom
ChangyiYang:main

ChangyiYang commented Dec 5, 2025 •

edited

Loading

Uh oh!

ChangyiYang commented Dec 6, 2025

Uh oh!

Uh oh!

Hecate0821 commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ChangyiYang commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description