Account for compute cost in collectives during redistribution #94

fmassa · 2025-08-13T18:54:54Z

This removes a long-standing hack to tell the solver that S(1) -> R is more expensive than S(0) -> R because of an additional data movement.

Indeed, when performing S(1) -> R, we currently perform an all-gather on dim 0, and then a full copy of the data. This wasn't modelled properly before (we just multiplied the comm cost by an arbitrary factor of 4), now this is taken properly into account.

This removes a long-standing hack to tell the solver that S(1) -> R is more expensive than S(0) -> R because of an additional data movement

bdhirsh · 2025-08-13T18:59:01Z

autoparallel/compute_estimation.py

+        elif src_plc.is_shard() and src_plc.dim != 0 and tgt_plc.is_replicate():
+            # add cost of additional cat on full size
+            # *2 because we need to count input and output reads
+            read_write_bytes = (


ah got it thanks - the copy is obviously bandwidth bound so we can just use mem bandwidth to estimate cost.

fmassa · 2025-08-13T19:03:13Z

The problem with this PR is that it increases the runtime for the solver on a model with a single transformer block from 2.95 s to 12.43 s, because the *4 in the previous heuristic was making some cases be way more expensive than it should so the solver was having an easier time skipping those cases... :-/

fmassa · 2025-08-13T19:45:01Z

Ok so the solver time goes back to a more reasonable amount if we assume 50% efficiency for the IO for the copy. I need to check if this is reasonable or not

…sa/compute_cost_in_comms

wconstab · 2025-08-16T14:35:45Z

Well, it's nice to clean up that hack, but to your earlier point, making the cost more perfectly align with reality is less important than making the system work well on models. Wondering, does this change help with a better solution in some case?

fmassa · 2025-08-16T16:46:46Z

Yes, I actually worked on this because the view->mm->view PR exposed the issue that this PR is trying to solve.

The symptom was that we should in principle have the same solution when doing view->mm->view and the einsum formulation, but that wasn't the case, and this PR is an attempt to fix it.

I still want to test this more thoroughly before merging, and I'll only merge if I find it beneficial

…sa/compute_cost_in_comms

ezyang · 2025-09-02T04:45:54Z

See also pytorch/pytorch#161882

fmassa · 2025-09-10T15:15:29Z

Subsumed by #125

Account for compute cost in collectives as well

d1281a4

This removes a long-standing hack to tell the solver that S(1) -> R is more expensive than S(0) -> R because of an additional data movement

fmassa requested review from bdhirsh and wconstab August 13, 2025 18:54

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 13, 2025

fmassa changed the title ~~Account for compute cost in collectives as well~~ Account for compute cost in collectives during redistribution Aug 13, 2025

bdhirsh reviewed Aug 13, 2025

View reviewed changes

fmassa added 3 commits August 14, 2025 08:23

Merge branch 'main' of github.com:meta-pytorch/autoparallel into fmas…

299f184

…sa/compute_cost_in_comms

Support getitem as well

a56c784

Improve comments and suppose 80% efficiency

851cf00

fmassa added 2 commits August 20, 2025 12:48

Merge branch 'main' of github.com:meta-pytorch/autoparallel into fmas…

8396a09

…sa/compute_cost_in_comms

Suppose 70% efficiency for comms

5f4f730

fmassa mentioned this pull request Aug 21, 2025

Avoid nn.Linear decomposition by replacing view -> mm -> view with einsum #26

Merged

fmassa added 2 commits August 21, 2025 18:23

Merge branch 'main' of github.com:meta-pytorch/autoparallel into fmas…

a8f435c

…sa/compute_cost_in_comms

Merge branch 'main' of github.com:meta-pytorch/autoparallel into fmas…

e025188

…sa/compute_cost_in_comms

fmassa mentioned this pull request Aug 29, 2025

Account for compute cost in collectives during redistribution #125

Merged

fmassa closed this Sep 10, 2025

fmassa deleted the fmassa/compute_cost_in_comms branch September 10, 2025 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Account for compute cost in collectives during redistribution #94

Account for compute cost in collectives during redistribution #94

Uh oh!

fmassa commented Aug 13, 2025 •

edited

Loading

Uh oh!

bdhirsh Aug 13, 2025

Uh oh!

fmassa commented Aug 13, 2025

Uh oh!

fmassa commented Aug 13, 2025

Uh oh!

wconstab commented Aug 16, 2025

Uh oh!

fmassa commented Aug 16, 2025

Uh oh!

ezyang commented Sep 2, 2025

Uh oh!

fmassa commented Sep 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Account for compute cost in collectives during redistribution #94

Account for compute cost in collectives during redistribution #94

Uh oh!

Conversation

fmassa commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdhirsh Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

fmassa commented Aug 13, 2025

Uh oh!

fmassa commented Aug 13, 2025

Uh oh!

wconstab commented Aug 16, 2025

Uh oh!

fmassa commented Aug 16, 2025

Uh oh!

ezyang commented Sep 2, 2025

Uh oh!

fmassa commented Sep 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fmassa commented Aug 13, 2025 •

edited

Loading