Non-blocking GPU memory transfer #620

usehand · 2019-12-10T21:36:23Z

🚀 Feature

As far as I can tell there is no way to currently set the memory transfers to be non-blocking. That is, to use tensor.to(device, non_blocking=True) on the internal data transfers that Lightning does.

Motivation

Asynchronous data transfers can speed up execution in some cases.

Pitch

Either always have it set to True by default (I don't think that has any negative consequences). Or, if not, set an option somewhere that allows this to be chosen by the user.

Another option is to check if memory has been set to pinned (pin_memory in Dataloader), and if that's the case do the non_blocking transfers, as that's the only reason to pin memory, I believe

jeffling · 2019-12-13T00:30:42Z

It seems like a micro-optimization, but at the same time https://discuss.pytorch.org/t/should-we-set-non-blocking-to-true/38234/9 it seems to have no known bad side effects since synchronization is handled lazily.

Is there any case where a project did this and logged a notable speedup? @usehand

My instinct is to be conservative on something this low-level - if this is truly the best practice setting then pytorch can make the decision on setting it as a default. But logically I also see no reason for us not to do this :)

WDYT @williamFalcon?

usehand · 2019-12-13T07:29:18Z

I usually always set it to true, since (as you noticed) there doesnt seem to be any reason not to and it can speed things up.

I think one of the reasons Pytorch doesnt adopt it as default is that it requires memory to be pinned, and that is probably not going to be a default for a series of reasons (see https://discuss.pytorch.org/t/what-is-the-disadvantage-of-using-pin-memory/1702/2 and https://discuss.pytorch.org/t/when-to-set-pin-memory-to-true/19723/2).

However, given that the memory has been pinned, I dont see any reason not to do asynchronous transfers, hence what I suggested. Yet I can understand not wanting to automatically set that "without letting the user know", so an option might be a better choice (though lightning does seem to have the philosophy of making "good choices" automatically on behalf of the user).

Of course this then comes with the issue of option-bloat, etc. Another possibility to avoid inflating the number of arguments passed to Trainer is to instead have a trainer method set_non_blocking(True) or something like that, instead of a constructor argument.

williamFalcon · 2020-02-11T15:21:28Z

I agree with @jeffling if this is actually a best practice then I like the idea to do it automatically for a user when pin_memory=True.

@tullie thoughts?

@usehand want to submit a PR?

Borda · 2020-03-26T15:34:34Z

@usehand mind submit a PR?
cc: @neggert @PyTorchLightning/core-contributors

veritas9872 · 2020-04-25T12:08:04Z

Looks very similar to #1316 in my opinion.

As someone who uses 3D data and other non-standard images quite a bit, I can personally attest that this can be very useful to a niche community.

awaelchli · 2020-05-14T22:35:19Z

added here #1843

usehand added feature Is an improvement or enhancement help wanted Open to be worked on labels Dec 10, 2019

williamFalcon added this to the 0.6.1 milestone Feb 11, 2020

williamFalcon added the need fix label Feb 11, 2020

williamFalcon modified the milestones: 0.7.0, 0.7.1 Mar 3, 2020

williamFalcon removed the need fix label Mar 7, 2020

Borda modified the milestones: 0.7.2, 0.7.3 Mar 26, 2020

Borda modified the milestones: 0.7.4, 0.7.5 Apr 24, 2020

williamFalcon added the priority: 0 High priority task label Apr 25, 2020

NaleRaphael mentioned this issue Apr 25, 2020

ENH: Transfer data to cuda with non_blocking=True davidtvs/pytorch-lr-finder#31

Merged

Borda modified the milestones: 0.7.6, 0.8.0 May 12, 2020

williamFalcon closed this as completed May 14, 2020

Borda modified the milestones: 0.8.0, 0.7.6 May 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-blocking GPU memory transfer #620

Non-blocking GPU memory transfer #620

usehand commented Dec 10, 2019 •

edited

Loading

jeffling commented Dec 13, 2019

usehand commented Dec 13, 2019 •

edited

Loading

williamFalcon commented Feb 11, 2020

Borda commented Mar 26, 2020

veritas9872 commented Apr 25, 2020 •

edited

Loading

awaelchli commented May 14, 2020

Non-blocking GPU memory transfer #620

Non-blocking GPU memory transfer #620

Comments

usehand commented Dec 10, 2019 • edited Loading

🚀 Feature

Motivation

Pitch

jeffling commented Dec 13, 2019

usehand commented Dec 13, 2019 • edited Loading

williamFalcon commented Feb 11, 2020

Borda commented Mar 26, 2020

veritas9872 commented Apr 25, 2020 • edited Loading

awaelchli commented May 14, 2020

usehand commented Dec 10, 2019 •

edited

Loading

usehand commented Dec 13, 2019 •

edited

Loading

veritas9872 commented Apr 25, 2020 •

edited

Loading