Skip to content

Conversation

@fmassa
Copy link
Contributor

@fmassa fmassa commented Aug 29, 2025

This removes a long-standing hack to tell the solver that S(1) -> R is more expensive than S(0) -> R because of an additional data movement.

Indeed, when performing S(1) -> R, we currently perform an all-gather on dim 0, and then a full copy of the data. This wasn't modelled properly before (we just multiplied the comm cost by an arbitrary factor of 4), now this is taken properly into account.

We also more correctly model the all-to-all cost now, although there is a *5 scaling factor that was added but which needs to be improved and I just added temporarily to get this merged.

This PR subsumes #94, as we now have our own redistribution function.

This removes a long-standing hack to tell the solver that S(1) -> R is more expensive than S(0) -> R because of an additional data movement.
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 29, 2025
@fmassa fmassa marked this pull request as ready for review September 10, 2025 14:06
@fmassa fmassa merged commit cd27579 into main Sep 10, 2025
6 checks passed
@fmassa fmassa deleted the fmassa/compute_cost_in_comms_v2 branch September 10, 2025 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants