Skip to content

Conversation

@fmassa
Copy link
Contributor

@fmassa fmassa commented Aug 4, 2025

This is particularly important for constructor nodes and for the flop estimation, otherwise they could materialize massive tensors in memory, leading to OOM. This shows up in DeepSeek, and I'm splitting this up from #29

This is particularly important for constructor nodes and for the flop estimation, otherwise they could materialize massive tensors in memory, leading to OOM. This shows up in DeepSeek.
@fmassa fmassa requested review from ezyang and wconstab August 4, 2025 11:09
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 4, 2025

from torch._subclasses.fake_tensor import unset_fake_temporarily

with unset_fake_temporarily():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you happen to know why this was here originally? Seems like it would have been logical to use fake mode all along. Maybe it was just to work around the one case with reshape below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was there because in one of the shardings we were calling copy.deepcopy , which is not allowed under fake_mode.

@fmassa fmassa merged commit 3fba727 into main Aug 4, 2025
5 of 6 checks passed
@fmassa fmassa deleted the fmassa/fake_mode_everywhere branch August 4, 2025 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants