Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create the return value on device to avoid unnecessary copying from CPU #26151

Merged
merged 1 commit into from
Sep 18, 2023

Conversation

mksit
Copy link
Contributor

@mksit mksit commented Sep 13, 2023

What does this PR do?

router_tuple = (torch.tensor([0], device=hidden_states.device),) introduces an unnecessary data copy from the CPU. I have changed it to create the return tensor on the device to avoid potential performance issues.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you implying that torch.tensor([0], device=hidden_states.device) does not create the tensor on device=hidden_states.device ? The [0] is copied but it does not have a place from where it's copied no?

Do you have a reference for this the doc does not seem to indicate this:

device (torch.device, optional) – the device of the constructed tensor. If None and data is a tensor then the device of data is used. If None and data is not a tensor then the result tensor is constructed on the CPU.

@mksit
Copy link
Contributor Author

mksit commented Sep 14, 2023

I am sorry that I did not explain clearly. The tensor is created on device=hidden_states.device, but the current creation method has caused non-negligible overheads due to the data copying in my program, which can be seen in the following trace.

Screenshot 2023-09-14 at 12 30 33

This overhead seems unnecessary, so I have suggested this change in this commit.

@fxmarty
Copy link
Contributor

fxmarty commented Sep 14, 2023

@mksit Are you sure the aten::to op is not there anymore when replacing torch.Tensor by torch.zeros? I generally don't trust pytorch profiler for the timing of aten::to. @NouamaneTazi may have more insights

@mksit
Copy link
Contributor Author

mksit commented Sep 15, 2023

@fxmarty The aten::to operation disappeared after the replacement in my case. What do you suggest for the profiling of aten::to?

Copy link
Contributor

@fxmarty fxmarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After profiling, LGTM. Indeed torch.zeros avoids an aten::to.

image

image

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@ArthurZucker ArthurZucker merged commit 97f439a into huggingface:main Sep 18, 2023
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

MKhalusova pushed a commit to MKhalusova/transformers that referenced this pull request Sep 19, 2023
parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants