-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensor memory management and memory hierarchy redesign #136
Comments
If I'm not mistaken one buffer will be allocated on the CPU and one on the GPU, is that correct? Does this also apply to storage tensors? For them the CPU buffer seems redundant and with something like DNNs one would have a lot of storage tensors. |
@alexander-g no, vkBuffer and vkMemory is always vRAM (GPU memory), the only difference is whether the memory is CPU / Host visible or GPU-Device-visible-only. You can have a look at the PR #137 which implements it |
The main thing here is that instead of having the concept of Staging tensors, now there would be the concept of host-visible-gpu-memory tensors, and device-only-memory-tensors. For host visible it would only use a single buffer for host memory. For device-only in the case of TensorType::eDevice it would create an extra buffer for staging the data into the device, and for the case of TensorType::eStorage, it would only use one buffer. It could also be explored for functionality that only creates the staging the buffer for data transfer into the device - this would add an overhead when reading / writing into the device, but could be a handy option actually (which we can explore once the PRs are merged) |
Both of the main PRs including phase 1, 2 and 3 have been merged. Closing this issue and adding an extra issue to explore #4 separately |
From various discussions initial research was carried out to explore better and more optimized ways to handle memory management both on the CPU (heap) and GPU (host/device memory), mainly from points summarised in the discussion in #130 and #36.
This issue will mean we will close #113 and #36
The proposed design encompasses the following changes (which would be carried out one after the other):
std::vector mData;
completely from Tensor in favour of always storing data in hostVisible buffer memory (TBC)Below each of the points are covered in more details:
1) Removing the concept of staging type tensors in favour of hostVisible type tensors and using two buffers per tensor
This includes removing the concept of a "staging tensor". Instead we will now just have hostVisible, deviceOnlyVisible and storage (which is deviceVisible without src / dst copy). This also encompasses amending tensors to always have two vkBuffers and two vkMemories. This will optimise memory requirements and will reduce complexity from OpTensorCreate to need to hold a separate set of staging vectors only to perform the transfer.
2) Removing the concept of persistent anonymous sequences (in conjunction with optensor not owning tensor memory)
This basically encompasses simply removing the concept of anoymous sequences to avoid implicit accumulation of memory from every time a new manager operation is run. This will re-use the same DEFAULT operation for anonymous functions. The main challenge will be that the OpCreateTensor will always need to be created in a named sequence (although this will be addressed in the next phase)
3) Amending the memory hierarchy such that Manager owns tensors directly instead of OpCreateTensor
This step removes the memory ownership from OpBase towards Tensor. This will require a potentially large breaking change, as the most intuitive way in which this can be added is by ensuring that always Tensors have to be created via the manager, which would need to be via one of the two options:
mgr.buildManagedTensor(....)
function, or;t = std::shared_ptr<Tensor(...)>
and then "initialise/register it" with the manager viamgr.buildManagedTensor(t)
This will ensure that the Manager keeps track of all the relevant Tensors created, and ensures they are destroyed when the manager is destroyed. Of course, this wouldn't be required if you initialise the tensor with your own vulkan components, such as:
t = Tensor(...)
t.init(vkdevice, vkinstance)
4) Remove
std::vector mData;
completely from Tensor in favour of always storing data in hostVisible buffer memory (TBC)After doing some resarch I was able to confirm:
Due to these points, it may be possible to completely remove the
std::vector mData
attribut completely whilst still keeping the best of both worlds - namely 1) access to the memory that was added to the GPU, and 2) optimum memory usage (mainly by not storing an extra copy of the data).This point would still need to be explored further as tehre are some nuances in regards to how this would be implemented in a memory-safe way, as well as ensuring that the python interface can be exposed efficiently, as ideally it can be built in a way in which it can be passed to and from python with minimal amount of copies (as it's not clear what the support for non-owning containers like span is)
The text was updated successfully, but these errors were encountered: