-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-blocking GPU memory transfer #620
Comments
It seems like a micro-optimization, but at the same time https://discuss.pytorch.org/t/should-we-set-non-blocking-to-true/38234/9 it seems to have no known bad side effects since synchronization is handled lazily. Is there any case where a project did this and logged a notable speedup? @usehand My instinct is to be conservative on something this low-level - if this is truly the best practice setting then pytorch can make the decision on setting it as a default. But logically I also see no reason for us not to do this :) WDYT @williamFalcon? |
I usually always set it to true, since (as you noticed) there doesnt seem to be any reason not to and it can speed things up. I think one of the reasons Pytorch doesnt adopt it as default is that it requires memory to be pinned, and that is probably not going to be a default for a series of reasons (see https://discuss.pytorch.org/t/what-is-the-disadvantage-of-using-pin-memory/1702/2 and https://discuss.pytorch.org/t/when-to-set-pin-memory-to-true/19723/2). However, given that the memory has been pinned, I dont see any reason not to do asynchronous transfers, hence what I suggested. Yet I can understand not wanting to automatically set that "without letting the user know", so an option might be a better choice (though lightning does seem to have the philosophy of making "good choices" automatically on behalf of the user). Of course this then comes with the issue of option-bloat, etc. Another possibility to avoid inflating the number of arguments passed to Trainer is to instead have a trainer method |
Looks very similar to #1316 in my opinion. As someone who uses 3D data and other non-standard images quite a bit, I can personally attest that this can be very useful to a niche community. |
added here #1843 |
🚀 Feature
As far as I can tell there is no way to currently set the memory transfers to be non-blocking. That is, to use tensor.to(device, non_blocking=True) on the internal data transfers that Lightning does.
Motivation
Asynchronous data transfers can speed up execution in some cases.
Pitch
Either always have it set to True by default (I don't think that has any negative consequences). Or, if not, set an option somewhere that allows this to be chosen by the user.
Another option is to check if memory has been set to pinned (pin_memory in Dataloader), and if that's the case do the non_blocking transfers, as that's the only reason to pin memory, I believe
The text was updated successfully, but these errors were encountered: