Create the return value on device to avoid unnecessary copying from CPU #26151

mksit · 2023-09-13T20:12:35Z

What does this PR do?

router_tuple = (torch.tensor([0], device=hidden_states.device),) introduces an unnecessary data copy from the CPU. I have changed it to create the return tensor on the device to avoid potential performance issues.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

ArthurZucker

Are you implying that torch.tensor([0], device=hidden_states.device) does not create the tensor on device=hidden_states.device ? The [0] is copied but it does not have a place from where it's copied no?

Do you have a reference for this the doc does not seem to indicate this:

device (torch.device, optional) – the device of the constructed tensor. If None and data is a tensor then the device of data is used. If None and data is not a tensor then the result tensor is constructed on the CPU.

mksit · 2023-09-14T11:47:30Z

I am sorry that I did not explain clearly. The tensor is created on device=hidden_states.device, but the current creation method has caused non-negligible overheads due to the data copying in my program, which can be seen in the following trace.

This overhead seems unnecessary, so I have suggested this change in this commit.

fxmarty · 2023-09-14T15:09:36Z

@mksit Are you sure the aten::to op is not there anymore when replacing torch.Tensor by torch.zeros? I generally don't trust pytorch profiler for the timing of aten::to. @NouamaneTazi may have more insights

mksit · 2023-09-15T14:43:06Z

@fxmarty The aten::to operation disappeared after the replacement in my case. What do you suggest for the profiling of aten::to?

fxmarty

After profiling, LGTM. Indeed torch.zeros avoids an aten::to.

ArthurZucker

Thanks

HuggingFaceDocBuilderDev · 2023-09-18T22:07:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

…PU (huggingface#26151)

Create the return value on device to avoid unnecessary copying from CPU

9eda98d

ArthurZucker reviewed Sep 13, 2023

View reviewed changes

ArthurZucker requested a review from fxmarty September 14, 2023 15:05

fxmarty approved these changes Sep 16, 2023

View reviewed changes

fxmarty requested a review from ArthurZucker September 16, 2023 08:40

ArthurZucker approved these changes Sep 18, 2023

View reviewed changes

ArthurZucker merged commit 97f439a into huggingface:main Sep 18, 2023

MKhalusova pushed a commit to MKhalusova/transformers that referenced this pull request Sep 19, 2023

Create the return value on device to avoid unnecessary copying from C…

50e805c

…PU (huggingface#26151)

parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023

Create the return value on device to avoid unnecessary copying from C…

aa9dfce

…PU (huggingface#26151)

ArthurZucker mentioned this pull request Sep 27, 2023

[VITS] Fix speaker_embed device mismatch #26115

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create the return value on device to avoid unnecessary copying from CPU #26151

Create the return value on device to avoid unnecessary copying from CPU #26151

mksit commented Sep 13, 2023

ArthurZucker left a comment

mksit commented Sep 14, 2023

fxmarty commented Sep 14, 2023

mksit commented Sep 15, 2023

fxmarty left a comment •

edited

Loading

ArthurZucker left a comment

HuggingFaceDocBuilderDev commented Sep 18, 2023

Create the return value on device to avoid unnecessary copying from CPU #26151

Create the return value on device to avoid unnecessary copying from CPU #26151

Conversation

mksit commented Sep 13, 2023

What does this PR do?

Before submitting

ArthurZucker left a comment

Choose a reason for hiding this comment

mksit commented Sep 14, 2023

fxmarty commented Sep 14, 2023

mksit commented Sep 15, 2023

fxmarty left a comment • edited Loading

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Sep 18, 2023

fxmarty left a comment •

edited

Loading