-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record comms input and output tensor information #1014
Conversation
This pull request was exported from Phabricator. Differential Revision: D65785010 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepting to unblock, lets make sure to add a trace though
Summary: Just copy the output from NCCL metadata about tensor information. Allows for easier analysis of kernel memory access patterns in downstream tools. Reviewed By: sraikund16 Differential Revision: D65785010
7d78560
to
40b9db2
Compare
This pull request was exported from Phabricator. Differential Revision: D65785010 |
Summary: Just copy the output from NCCL metadata about tensor information. Allows for easier analysis of kernel memory access patterns in downstream tools. Reviewed By: sraikund16 Differential Revision: D65785010
40b9db2
to
1539c9a
Compare
This pull request was exported from Phabricator. Differential Revision: D65785010 |
Summary: Just copy the output from NCCL metadata about tensor information. Allows for easier analysis of kernel memory access patterns in downstream tools. Reviewed By: sraikund16 Differential Revision: D65785010
1539c9a
to
5d83c43
Compare
This pull request was exported from Phabricator. Differential Revision: D65785010 |
Summary: Just copy the output from NCCL metadata about tensor information. Allows for easier analysis of kernel memory access patterns in downstream tools. Reviewed By: sraikund16 Differential Revision: D65785010
5d83c43
to
fa2d485
Compare
This pull request was exported from Phabricator. Differential Revision: D65785010 |
Summary: Just copy the output from NCCL metadata about tensor information. Allows for easier analysis of kernel memory access patterns in downstream tools. Reviewed By: sraikund16 Differential Revision: D65785010
fa2d485
to
50ae474
Compare
This pull request was exported from Phabricator. Differential Revision: D65785010 |
This pull request has been merged in d9673d2. |
This pull request has been reverted by 158d409. |
Summary: Reverts a rollback D66458621 Revert the Kineto rollback, this would have partially solved the issue since this part controls transmission of the metadata to the corresponding kernel. but record_param_comms in pytorch is the real issue and was still recording this metadata and would still make an invalid trace JSON when working with GPUs>30 (our truncation case). Differential Revision: D66475394
Summary: Pull Request resolved: #1017 Reverts a rollback D66458621 Revert the Kineto rollback, this would have partially solved the issue since this part controls transmission of the metadata to the corresponding kernel. but record_param_comms in pytorch is the real issue and was still recording this metadata and would still make an invalid trace JSON when working with GPUs>30 (our truncation case). Reviewed By: sraikund16 Differential Revision: D66475394 fbshipit-source-id: 5b781ccb27fa898a1a6496c72733f72fd31c822e
Summary: Just copy the output from NCCL metadata about tensor information. Allows for easier analysis of kernel memory access patterns in downstream tools.
Differential Revision: D65785010