-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[subclasses] Use __slots__ for micro optim of flatten/unflatten #1211
base: gh/IvanKobzarev/3/base
Are you sure you want to change the base?
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1211
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit f274127 with merge base 88d604f (): BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: 78700dffdc8daf49e43a80e68e2a7888e681503c Pull Request resolved: #1211
…atten" Profiling the case from pytorch/pytorch#129457 found that using __slots__ a bit helps to reduce the cost of flatten (14us -> 11us). As a result 20 fp8 quantized weights flattening gets -40us (300us -> 260us). [ghstack-poisoned]
ghstack-source-id: 29e856540122dd6d0a8d3a522617234af70a6ca3 Pull Request resolved: #1211
@@ -258,6 +261,16 @@ def fsdp_post_all_gather( | |||
|
|||
|
|||
class WeightWithDelayedFloat8CastTensor(torch.Tensor): | |||
|
|||
__slots__ = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious is __slots__
derived from __tensor_flatten__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, here I just collected all attributes used in tensor_flatten, tensor_unflatten
…atten" Profiling the case from pytorch/pytorch#129457 found that using __slots__ a bit helps to reduce the cost of flatten (14us -> 11us). As a result 20 fp8 quantized weights flattening gets -40us (300us -> 260us). [ghstack-poisoned]
ghstack-source-id: 09af60f4537f21a491d21a02d99ad792185d32aa Pull Request resolved: #1211
Stack from ghstack (oldest at bottom):
Profiling the case from pytorch/pytorch#129457 found that using slots a bit helps to reduce the cost of flatten (14us -> 11us). As a result 20 fp8 quantized weights flattening gets -40us (300us -> 260us).