Use fake_mode when constructing ShardingOptimizer #70

fmassa · 2025-08-04T11:09:17Z

This is particularly important for constructor nodes and for the flop estimation, otherwise they could materialize massive tensors in memory, leading to OOM. This shows up in DeepSeek, and I'm splitting this up from #29

This is particularly important for constructor nodes and for the flop estimation, otherwise they could materialize massive tensors in memory, leading to OOM. This shows up in DeepSeek.

wconstab · 2025-08-04T13:48:34Z

autoparallel/api.py


-        from torch._subclasses.fake_tensor import unset_fake_temporarily
-
-        with unset_fake_temporarily():


Do you happen to know why this was here originally? Seems like it would have been logical to use fake mode all along. Maybe it was just to work around the one case with reshape below?

Yeah, it was there because in one of the shardings we were calling copy.deepcopy , which is not allowed under fake_mode.

Use fake_mode when constructing ShardingOptimizer

c84a38d

This is particularly important for constructor nodes and for the flop estimation, otherwise they could materialize massive tensors in memory, leading to OOM. This shows up in DeepSeek.

fmassa requested review from ezyang and wconstab August 4, 2025 11:09

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 4, 2025

wconstab approved these changes Aug 4, 2025

View reviewed changes

fmassa merged commit 3fba727 into main Aug 4, 2025
5 of 6 checks passed

fmassa deleted the fmassa/fake_mode_everywhere branch August 4, 2025 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use fake_mode when constructing ShardingOptimizer #70

Use fake_mode when constructing ShardingOptimizer #70

Uh oh!

fmassa commented Aug 4, 2025

Uh oh!

wconstab Aug 4, 2025

Uh oh!

fmassa Aug 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		from torch._subclasses.fake_tensor import unset_fake_temporarily

		with unset_fake_temporarily():

Use fake_mode when constructing ShardingOptimizer #70

Use fake_mode when constructing ShardingOptimizer #70

Uh oh!

Conversation

fmassa commented Aug 4, 2025

Uh oh!

wconstab Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

fmassa Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants