Sharing mData between device and staging tensors #130

alexander-g · 2021-02-05T07:55:23Z

Currently, when creating a tensor the data is copied several times:

in the Tensor constructor
in OpTensorCreate when creating a staged tensor
the staged tensor additionally allocates a host-side vulkan buffer to copy the data into

Optionally for input/output tensors:

in OpTensorSyncDevice/OpTensorSyncLocal when creating another staged tensor
which creates another host-side vulkan buffer

This is 5x the required memory. Additionally, those staging tensors are not destroyed because sequences are never deleted (#36) which creates a memory leak. This is unacceptable when working with large amounts of data. It has already happened that I've run out of host memory several times.

To mitigate this, I've converted Tensor::mData from std::vector to a std::shared_ptr<std::vector> to enable sharing the data between device tensors and staging tensors to reduce memory footprint a little.
In the long term larger refactoring is needed (esp. #36 and #14)

axsaucedo · 2021-02-06T17:06:44Z

Interesting @alexander-g - ok so just as a heads up, currently the only reason why the data function was changed from a return by value into a return by reference and not a "return by const ref" is to simplify the Python examples which then allow you to perform operations like tensor[0]++ or tensor[0] = .... Without this it would become harder.

This something that I have been thinking that really the tensor data should only be modifiable through setters and getters, instead of directly as a reference, mainly as this would allow users to know when the Tensor has been modified. In this case, we would be able to actually have the data() function returning a const ref, and then the data_sp() as a friend function so it could be initialised by passing another tensor for example.

I do understand that your proposal in a parallel discussion is to explore removing the vector all-together from inside the Tensor object, and only have the GPU memory as the main memory location. This is something that we could explore but my only concern is what the cost of accessing Host visible memory is compared to storing it in a vector - do you know what I mean?

In theory if the cost of returning just the value that is currently located in host only memory then this could be a way in which we could get rid of the std::vector, and when data() is called, the vector could be created via the underlying data pointer and memory size (offset).

What are your thoughts on this? We could explore merging this initially as I do understand the need for sharing the std::vector across staging and device tensors as a starter as it's key to address this memory leak you identified but I would actually be very keen to explore this discussion further.

Let me know your thoughts and then we'll proceed from there.

alexander-g · 2021-02-07T08:56:46Z

This is something that we could explore but my only concern is what the cost of accessing Host visible memory is compared to storing it in a vector - do you know what I mean?

No, I'm not completely sure what you mean.
I want to go even further than you think I want: I suggest to not only remove the vector from tensors but also the operations like tensor[0]++ or tensor[0] = ..., because (no offense but) this is a toy functionality compared to what numpy (or Eigen or xtensor) offers. I've never used them so far, because I'd use numpy on CPU anyway, loading data from disk: numpy, displaying an image: numpy, slicing images: numpy. And if you want to extend these operations this would be basically reinventing the wheel.

I propose to maybe even rename Tensor to something like Buffer which is only a data transfer interface to/from GPU. The API might look like this:

A = mgr.create_buffer(256) #256 bytes
B = mgr.create_buffer(256) #immediately allocated on the device
C = mgr.create_buffer(256)
D = mgr.create_buffer(256)

#no need for OpCreateTensor

sequence.record_transfer_to_device([A,B])
sequence.record_algo(mult_shader, [C,A,B]) #C = A*B
sequence.record_algo(add_shader,  [D,C,B]) #D = C+B
sequence.record_transfer_to_host([D])

#assign np.ones to buffer A, np.arange to buffer B, execute, receive D
d = sequence.eval(np.ones(256), np.arange(256))

assert isinstance(d, np.ndarray)

Buffer C would be device-only (what currently eStorage is)

We could explore merging this initially

Sure

axsaucedo · 2021-02-07T18:05:53Z

@alexander-g I've done some further research this weekend, and I have opened an issue with the details, which basically would encompass a redesigned approach in favour of this PR, it would be good to hear your thoughts #136

axsaucedo · 2021-02-08T07:22:13Z

@alexander-g closing this issue in favour of #130, would be good to hear your thoughts on the PR as well as on the proposal #130

sharing mData between device and staging tensors

a6b7a6d

This was referenced Feb 6, 2021

Delete methods for sequences inside managers #113

Closed

Create delete function in manager to free / destroy sequence #36

Closed

axsaucedo mentioned this pull request Feb 7, 2021

Tensor memory management and memory hierarchy redesign #136

Closed

axsaucedo mentioned this pull request Feb 8, 2021

Removed Staging Tensors in favour of having two buffer & memory in a Tensor to minimise data transfer #137

Merged

axsaucedo closed this Feb 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharing mData between device and staging tensors #130

Sharing mData between device and staging tensors #130

alexander-g commented Feb 5, 2021 •

edited

Loading

axsaucedo commented Feb 6, 2021

alexander-g commented Feb 7, 2021 •

edited

Loading

axsaucedo commented Feb 7, 2021

axsaucedo commented Feb 8, 2021

Sharing mData between device and staging tensors #130

Sharing mData between device and staging tensors #130

Conversation

alexander-g commented Feb 5, 2021 • edited Loading

axsaucedo commented Feb 6, 2021

alexander-g commented Feb 7, 2021 • edited Loading

axsaucedo commented Feb 7, 2021

axsaucedo commented Feb 8, 2021

alexander-g commented Feb 5, 2021 •

edited

Loading

alexander-g commented Feb 7, 2021 •

edited

Loading