Feat: add usage docs for fsdp #1092

lin0303-siyuan · 2025-12-11T21:01:52Z

Update the docs for lately-supported FSDP backend for slime.

PopSoda2002 · 2025-12-11T21:04:45Z

@Hecate0821 Lets review this together!

PopSoda2002

Thanks!

PopSoda2002 · 2025-12-12T00:34:00Z

docs/en/get_started/usage.md

+| **Context Parallel**      | `--context-parallel-size`                                    | `--context-parallel-size`                              | Both support CP                                              |
+| **Initial Learning Rate** | `--lr`                                                       | `--lr`                                                 | Same parameter                                               |
+| **Learning Rate Decay**   | `--lr-decay-style` (linear/cosine)                           | `--lr-decay-style` (only constant)                     |                                                              |
+| **Warmup**                | `--lr-warmup-iters` (steps)                                  | Coming Soon                                            |                                                              |


Hi I think for learning rate related stuff, it has already being supported, check this out: #1040

Hecate0821 · 2025-12-12T01:16:56Z

docs/en/get_started/usage.md

+| **CPU Backend**           | Implemented via distributed optimizer | `--fsdp-cpu-backend`                                   | **FSDP**: Specify CPU backend and use hybrid backend when CPU offload is enabled |
+| **Attention Backend**     | Decided by Megatron Core                                     | `--attn-implementation` (flash_attention_2/sdpa/eager) | **FSDP**: Directly passed to HuggingFace                     |
+| **Mixed Precision**       | `--fp16` or `--bf16`                                         | `--fp16` (bf16 inferred automatically)                 | Basically same                                               |
+| **Offload on Save**       |                                                              | `--fsdp-state-dict-cpu-offload` (Default True)         | **FSDP**: Offload to CPU when saving checkpoint              |


I think we don't need to mention state-dict-cpu-offload in user doc, it should always be True.

PopSoda2002 · 2025-12-14T07:42:51Z

docs/en/get_started/usage.md

+| **Expert Parallel**       | `--expert-model-parallel-size`                               | Coming Soon                                            |                                                              |
+| **Context Parallel**      | `--context-parallel-size`                                    | `--context-parallel-size`                              | Both support CP                                              |
+| **Initial Learning Rate** | `--lr`                                                       | `--lr`                                                 | Same parameter                                               |
+| **Learning Rate Decay**   | `--lr-decay-style` (linear/cosine)                           | `--lr-decay-style` (only constant)                     |                                                              |


I think for lr decay, not only constant is supported? You can checkout that PR

PopSoda2002

LGTM!

Feat: add usage docs for fsdp

feat: add EN usage docs for fsdp

fb763af

PopSoda2002 reviewed Dec 12, 2025

View reviewed changes

Hecate0821 reviewed Dec 12, 2025

View reviewed changes

fix: address MR reviews

79c51ad

Hecate0821 approved these changes Dec 12, 2025

View reviewed changes

PopSoda2002 reviewed Dec 14, 2025

View reviewed changes

lin0303-siyuan added 3 commits December 17, 2025 15:30

fix: address MR reviews

24b70fe

feat: add zh docs

602c077

fix: address markdown formats

12b35ee

PopSoda2002 approved these changes Dec 18, 2025

View reviewed changes

PopSoda2002 merged commit 4eaee1e into THUDM:main Dec 18, 2025

PopSoda2002 mentioned this pull request Dec 18, 2025

[Doc][FSDP] Currently FSDP doc is missing in the repo #1047

Closed

Fengzdadi pushed a commit to Fengzdadi/slime that referenced this pull request Dec 19, 2025

Merge pull request THUDM#1092 from lin0303-siyuan/feat/fsdp-doc

ebe28ad

Feat: add usage docs for fsdp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: add usage docs for fsdp #1092

Feat: add usage docs for fsdp #1092

lin0303-siyuan commented Dec 11, 2025 •

edited

Loading

Uh oh!

PopSoda2002 commented Dec 11, 2025

Uh oh!

PopSoda2002 left a comment

Uh oh!

PopSoda2002 Dec 12, 2025

Uh oh!

Hecate0821 Dec 12, 2025 •

edited

Loading

Uh oh!

PopSoda2002 Dec 14, 2025

Uh oh!

PopSoda2002 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feat: add usage docs for fsdp #1092

Feat: add usage docs for fsdp #1092

Conversation

lin0303-siyuan commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PopSoda2002 commented Dec 11, 2025

Uh oh!

PopSoda2002 left a comment

Choose a reason for hiding this comment

Uh oh!

PopSoda2002 Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Hecate0821 Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PopSoda2002 Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

PopSoda2002 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lin0303-siyuan commented Dec 11, 2025 •

edited

Loading

Hecate0821 Dec 12, 2025 •

edited

Loading