Skip to content

Conversation

@joecummings
Copy link
Member

@joecummings joecummings commented Mar 31, 2025

Co-authored-by: Nathan Azrak nathan.azrak@gmail.com

Context

See #2415

Changelog

Test plan

UX

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 31, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2539

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Cancelled Jobs

As of commit 32fbfa7 with merge base d3ab3b7 (image):

NEW FAILURE - The following job has failed:

  • GPU tests / gpu_test (3.11, stable) (gh)
    tests/recipes/test_full_finetune_distributed.py::TestFullFinetuneDistributedRecipe::test_training_state_on_resume_from_distributed_checkpoint_single_rank[llama3/8B_full-llama3-tune-4-1-True]

CANCELLED JOBS - The following jobs were cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 31, 2025
@joecummings joecummings changed the title Hackily port hybrid sharding from torchtitan [WIP] Minimize all reduces when accumulating gradients Mar 31, 2025
@joecummings joecummings force-pushed the minimize-all-reduce branch from 1931f46 to 32fbfa7 Compare March 31, 2025 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants