Account for compute cost in collectives during redistribution #125
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This removes a long-standing hack to tell the solver that S(1) -> R is more expensive than S(0) -> R because of an additional data movement.
Indeed, when performing
S(1) -> R, we currently perform an all-gather on dim 0, and then a full copy of the data. This wasn't modelled properly before (we just multiplied the comm cost by an arbitrary factor of 4), now this is taken properly into account.We also more correctly model the all-to-all cost now, although there is a
*5scaling factor that was added but which needs to be improved and I just added temporarily to get this merged.This PR subsumes #94, as we now have our own redistribution function.