-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharing mData between device and staging tensors #130
Conversation
Interesting @alexander-g - ok so just as a heads up, currently the only reason why the This something that I have been thinking that really the tensor data should only be modifiable through setters and getters, instead of directly as a reference, mainly as this would allow users to know when the Tensor has been modified. In this case, we would be able to actually have the I do understand that your proposal in a parallel discussion is to explore removing the vector all-together from inside the Tensor object, and only have the GPU memory as the main memory location. This is something that we could explore but my only concern is what the cost of accessing Host visible memory is compared to storing it in a vector - do you know what I mean? In theory if the cost of returning just the value that is currently located in host only memory then this could be a way in which we could get rid of the std::vector, and when What are your thoughts on this? We could explore merging this initially as I do understand the need for sharing the std::vector across staging and device tensors as a starter as it's key to address this memory leak you identified but I would actually be very keen to explore this discussion further. Let me know your thoughts and then we'll proceed from there. |
No, I'm not completely sure what you mean. I propose to maybe even rename
Buffer C would be device-only (what currently eStorage is)
Sure |
@alexander-g I've done some further research this weekend, and I have opened an issue with the details, which basically would encompass a redesigned approach in favour of this PR, it would be good to hear your thoughts #136 |
@alexander-g closing this issue in favour of #130, would be good to hear your thoughts on the PR as well as on the proposal #130 |
Currently, when creating a tensor the data is copied several times:
Tensor
constructorOpTensorCreate
when creating a staged tensorOptionally for input/output tensors:
OpTensorSyncDevice
/OpTensorSyncLocal
when creating another staged tensorThis is 5x the required memory. Additionally, those staging tensors are not destroyed because sequences are never deleted (#36) which creates a memory leak. This is unacceptable when working with large amounts of data. It has already happened that I've run out of host memory several times.
To mitigate this, I've converted
Tensor::mData
fromstd::vector
to astd::shared_ptr<std::vector>
to enable sharing the data between device tensors and staging tensors to reduce memory footprint a little.In the long term larger refactoring is needed (esp. #36 and #14)