Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[subclasses] Use __slots__ for micro optim of flatten/unflatten #1211

Open
wants to merge 3 commits into
base: gh/IvanKobzarev/3/base
Choose a base branch
from

Conversation

IvanKobzarev
Copy link

@IvanKobzarev IvanKobzarev commented Nov 1, 2024

Stack from ghstack (oldest at bottom):

Profiling the case from pytorch/pytorch#129457 found that using slots a bit helps to reduce the cost of flatten (14us -> 11us). As a result 20 fp8 quantized weights flattening gets -40us (300us -> 260us).

Copy link

pytorch-bot bot commented Nov 1, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1211

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit f274127 with merge base 88d604f (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

IvanKobzarev added a commit that referenced this pull request Nov 1, 2024
ghstack-source-id: 78700dffdc8daf49e43a80e68e2a7888e681503c
Pull Request resolved: #1211
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 1, 2024
…atten"


Profiling the case from pytorch/pytorch#129457 found that using __slots__ a bit helps to reduce the cost of flatten (14us -> 11us). As a result 20 fp8 quantized weights flattening gets -40us  (300us -> 260us).



[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Nov 1, 2024
ghstack-source-id: 29e856540122dd6d0a8d3a522617234af70a6ca3
Pull Request resolved: #1211
@@ -258,6 +261,16 @@ def fsdp_post_all_gather(


class WeightWithDelayedFloat8CastTensor(torch.Tensor):

__slots__ = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious is __slots__ derived from __tensor_flatten__ ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, here I just collected all attributes used in tensor_flatten, tensor_unflatten

…atten"


Profiling the case from pytorch/pytorch#129457 found that using __slots__ a bit helps to reduce the cost of flatten (14us -> 11us). As a result 20 fp8 quantized weights flattening gets -40us  (300us -> 260us).



[ghstack-poisoned]
IvanKobzarev added a commit that referenced this pull request Nov 4, 2024
ghstack-source-id: 09af60f4537f21a491d21a02d99ad792185d32aa
Pull Request resolved: #1211
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants