Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record comms input and output tensor information #1014

Closed
wants to merge 1 commit into from

Conversation

sanrise
Copy link
Contributor

@sanrise sanrise commented Nov 14, 2024

Summary: Just copy the output from NCCL metadata about tensor information. Allows for easier analysis of kernel memory access patterns in downstream tools.

Differential Revision: D65785010

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65785010

Copy link
Contributor

@sraikund16 sraikund16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepting to unblock, lets make sure to add a trace though

sanrise added a commit to sanrise/kineto that referenced this pull request Nov 14, 2024
Summary:

Just copy the output from NCCL metadata about tensor information. Allows for easier analysis of kernel memory access patterns in downstream tools.

Reviewed By: sraikund16

Differential Revision: D65785010
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65785010

sanrise added a commit to sanrise/kineto that referenced this pull request Nov 14, 2024
Summary:

Just copy the output from NCCL metadata about tensor information. Allows for easier analysis of kernel memory access patterns in downstream tools.

Reviewed By: sraikund16

Differential Revision: D65785010
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65785010

sanrise added a commit to sanrise/kineto that referenced this pull request Nov 15, 2024
Summary:

Just copy the output from NCCL metadata about tensor information. Allows for easier analysis of kernel memory access patterns in downstream tools.

Reviewed By: sraikund16

Differential Revision: D65785010
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65785010

sanrise added a commit to sanrise/kineto that referenced this pull request Nov 15, 2024
Summary:

Just copy the output from NCCL metadata about tensor information. Allows for easier analysis of kernel memory access patterns in downstream tools.

Reviewed By: sraikund16

Differential Revision: D65785010
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65785010

Summary:

Just copy the output from NCCL metadata about tensor information. Allows for easier analysis of kernel memory access patterns in downstream tools.

Reviewed By: sraikund16

Differential Revision: D65785010
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D65785010

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in d9673d2.

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 158d409.

sanrise added a commit to sanrise/kineto that referenced this pull request Nov 25, 2024
Summary:
Reverts a rollback D66458621

Revert the Kineto rollback, this would have partially solved the issue since this part controls transmission of the metadata to the corresponding kernel. but record_param_comms in pytorch is the real issue and was still recording this metadata and would still make an invalid trace JSON when working with GPUs>30 (our truncation case).

Differential Revision: D66475394
facebook-github-bot pushed a commit that referenced this pull request Nov 26, 2024
Summary:
Pull Request resolved: #1017

Reverts a rollback D66458621

Revert the Kineto rollback, this would have partially solved the issue since this part controls transmission of the metadata to the corresponding kernel. but record_param_comms in pytorch is the real issue and was still recording this metadata and would still make an invalid trace JSON when working with GPUs>30 (our truncation case).

Reviewed By: sraikund16

Differential Revision: D66475394

fbshipit-source-id: 5b781ccb27fa898a1a6496c72733f72fd31c822e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants