Skip to content

Conversation

@fmassa
Copy link
Contributor

@fmassa fmassa commented Sep 5, 2025

No description provided.

ruisizhang123 and others added 5 commits September 4, 2025 12:40
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@fmassa fmassa requested a review from ruisizhang123 September 5, 2025 12:12
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 5, 2025
Copy link
Contributor

@ruisizhang123 ruisizhang123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thank you for adding tests.

if True:
backend = "nccl"
fake_store = None
kwargs = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: is there a reason we want to use a real pg instead of fake pg? iiuc, the test job in github only has 4 gpus. 🤔 we have to update world_size thing to 4 if we decide to use real pg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason was because I thought the benchmarking might need a real PG, although I haven't really tested to see if it works with a FakeStore

Copy link
Contributor

@ruisizhang123 ruisizhang123 Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhhh, I see. I tried using fake PG and the code went through. The current llama3 tests are use fake PG, but there are new issues with memory estimation caused by a recent change in pytorch. I plan to land my autobucketing PR today, and will land this one after the pytorch issue is fixed.

@ruisizhang123 ruisizhang123 force-pushed the fmassa/test_ruisi_branch branch from f2dd1ef to a368fbc Compare September 5, 2025 17:51
@ruisizhang123 ruisizhang123 force-pushed the fmassa/test_ruisi_branch branch from a368fbc to f1cc5fc Compare September 5, 2025 18:08
Base automatically changed from gh/ruisizhang123/5/head to gh/ruisizhang123/5/base September 6, 2025 07:54
Base automatically changed from gh/ruisizhang123/5/base to main September 6, 2025 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants