Enabling split_dI_dW and split_fsdp_collectives passes #231

sanketpurandare · 2025-11-05T08:52:21Z

Based off #227, will rebase once it lands.
Test file: examples/example_pp_graph_passes.py

Made some modifications to Ivan’s pass (#201) as follows:

The split_fsdp_prefetch pass required all the inputs (params+buffers+microbatch) to be passed to the prefetch graph and required all the outputs of the prefetch graph to be passed to the fwd_graph. But we wanted something slightly different. We only want to pass the sharded_params to the prefetch_graph and obtain unsharded_params from it. The fwd_graph then takes in these unsharded_params (obtained from prefetch_graph)+buffers+microbatch. This is important because we want the prefetch graph to be independent of micro batch since the unshard action won’t have an associated micro batch.

Similarly, reduce_grad graph should only take in the unsharded_grads from the bwd_graph and produce sharded_grads. We cannot pass in all the outputs of the bwd_graph to the reduce_grad graph, since it would make it micro batch dependent and we want to call reduce_grad action only once after accumulating unsharded_grads across micro batches.

Brian’s pass (#232) worked perfectly fine, just integrated it differently in the end-to-end workflow.

IvanKobzarev

splti_fsdp_prefetch lgtm

IvanKobzarev · 2025-11-05T10:44:00Z

autoparallel/_passes/split_fsdp_collectives.py

+    gm: torch.fx.GraphModule,
+    num_params: int,
+) -> tuple[torch.fx.GraphModule, torch.fx.GraphModule]:
+    g = deepcopy(gm.graph)


Curious, why do you want to keep the original graph unchanged? Will it be further used?

Since we are using its container graph module to initialize it with the two new graphs, I just thought it would be safer to this way, we can remove this later

IvanKobzarev · 2025-11-05T10:46:23Z

autoparallel/_passes/split_fsdp_collectives.py

-    g_ins = g.find_nodes(op="placeholder")
+def split_fsdp_prefetch(
+    gm: torch.fx.GraphModule,
+    num_params: int,


nit: If we use export with descriptors, potentially num_params could be taken from metadata.

Yeah, for now this is easily obtainable from graph meta. Same as above can be removed later.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 5, 2025

sanketpurandare requested review from IvanKobzarev, bdhirsh and xmfan November 5, 2025 08:52

IvanKobzarev approved these changes Nov 5, 2025

View reviewed changes

sanketpurandare force-pushed the di_dw_split branch from 710e1a6 to 40ad911 Compare November 5, 2025 18:15

Enabling split_dI_dW and split_fsdp_collectives passes

19ac5cb

sanketpurandare force-pushed the di_dw_split branch from 40ad911 to 19ac5cb Compare November 5, 2025 18:16

sanketpurandare merged commit 7fb094d into meta-pytorch:main Nov 5, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enabling split_dI_dW and split_fsdp_collectives passes #231

Enabling split_dI_dW and split_fsdp_collectives passes #231

Uh oh!

sanketpurandare commented Nov 5, 2025 •

edited

Loading

Uh oh!

IvanKobzarev left a comment •

edited

Loading

Uh oh!

IvanKobzarev Nov 5, 2025

Uh oh!

sanketpurandare Nov 5, 2025

Uh oh!

IvanKobzarev Nov 5, 2025 •

edited

Loading

Uh oh!

sanketpurandare Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enabling split_dI_dW and split_fsdp_collectives passes #231

Enabling split_dI_dW and split_fsdp_collectives passes #231

Uh oh!

Conversation

sanketpurandare commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IvanKobzarev left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IvanKobzarev Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

sanketpurandare Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

IvanKobzarev Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanketpurandare Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sanketpurandare commented Nov 5, 2025 •

edited

Loading

IvanKobzarev left a comment •

edited

Loading

IvanKobzarev Nov 5, 2025 •

edited

Loading