[subclasses] Use slots for micro optim of flatten/unflatten #1211

IvanKobzarev · 2024-11-01T11:46:52Z

Stack from ghstack (oldest at bottom):

-> [subclasses] Use __slots__ for micro optim of flatten/unflatten #1211

Profiling the case from pytorch/pytorch#129457 found that using slots a bit helps to reduce the cost of flatten (14us -> 11us). As a result 20 fp8 quantized weights flattening gets -40us (300us -> 260us).

[ghstack-poisoned]

pytorch-bot · 2024-11-01T11:46:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1211

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit f274127 with merge base 88d604f ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run Regression Tests / test (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/whl/nightl... / linux-job (gh) (trunk failure)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 78700dffdc8daf49e43a80e68e2a7888e681503c Pull Request resolved: #1211

…atten" Profiling the case from pytorch/pytorch#129457 found that using __slots__ a bit helps to reduce the cost of flatten (14us -> 11us). As a result 20 fp8 quantized weights flattening gets -40us (300us -> 260us). [ghstack-poisoned]

ghstack-source-id: 29e856540122dd6d0a8d3a522617234af70a6ca3 Pull Request resolved: #1211

weifengpy · 2024-11-01T16:13:08Z

torchao/float8/fsdp_utils.py

@@ -258,6 +261,16 @@ def fsdp_post_all_gather(


 class WeightWithDelayedFloat8CastTensor(torch.Tensor):
+
+    __slots__ = [


curious is __slots__ derived from __tensor_flatten__ ?

Yes, here I just collected all attributes used in tensor_flatten, tensor_unflatten

…atten" Profiling the case from pytorch/pytorch#129457 found that using __slots__ a bit helps to reduce the cost of flatten (14us -> 11us). As a result 20 fp8 quantized weights flattening gets -40us (300us -> 260us). [ghstack-poisoned]

ghstack-source-id: 09af60f4537f21a491d21a02d99ad792185d32aa Pull Request resolved: #1211

[subclasses] Use __slots__ for micro optim of flatten/unflatten

746c10d

[ghstack-poisoned]

IvanKobzarev added a commit that referenced this pull request Nov 1, 2024

[subclasses] Use __slots__ for micro optim of flatten/unflatten

f7f4337

ghstack-source-id: 78700dffdc8daf49e43a80e68e2a7888e681503c Pull Request resolved: #1211

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 1, 2024

IvanKobzarev requested review from jerryzh168, weifengpy and vkuzo November 1, 2024 11:48

IvanKobzarev mentioned this pull request Nov 1, 2024

[PT2][fp8][FSDP2] compile the function that pre-computes fp8 amax pytorch/pytorch#129457

Open

IvanKobzarev added a commit that referenced this pull request Nov 1, 2024

[subclasses] Use __slots__ for micro optim of flatten/unflatten

a2d0045

ghstack-source-id: 29e856540122dd6d0a8d3a522617234af70a6ca3 Pull Request resolved: #1211

weifengpy reviewed Nov 1, 2024

View reviewed changes

IvanKobzarev added a commit that referenced this pull request Nov 4, 2024

[subclasses] Use __slots__ for micro optim of flatten/unflatten

d5937fe

ghstack-source-id: 09af60f4537f21a491d21a02d99ad792185d32aa Pull Request resolved: #1211

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[subclasses] Use slots for micro optim of flatten/unflatten #1211

[subclasses] Use slots for micro optim of flatten/unflatten #1211

IvanKobzarev commented Nov 1, 2024 •

edited

Loading

pytorch-bot bot commented Nov 1, 2024 •

edited

Loading

weifengpy Nov 1, 2024

IvanKobzarev Nov 1, 2024

		@@ -258,6 +261,16 @@ def fsdp_post_all_gather(


		class WeightWithDelayedFloat8CastTensor(torch.Tensor):

		__slots__ = [

[subclasses] Use __slots__ for micro optim of flatten/unflatten #1211

Are you sure you want to change the base?

[subclasses] Use __slots__ for micro optim of flatten/unflatten #1211

Conversation

IvanKobzarev commented Nov 1, 2024 • edited Loading

pytorch-bot bot commented Nov 1, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1211

✅ You can merge normally! (1 Unrelated Failure)

weifengpy Nov 1, 2024

Choose a reason for hiding this comment

IvanKobzarev Nov 1, 2024

Choose a reason for hiding this comment

[subclasses] Use slots for micro optim of flatten/unflatten #1211

[subclasses] Use slots for micro optim of flatten/unflatten #1211

IvanKobzarev commented Nov 1, 2024 •

edited

Loading

pytorch-bot bot commented Nov 1, 2024 •

edited

Loading