You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
We are developing a distributed dataframe processing engine using CuDF (https://github.com/cylondata/cylon).
We would like to gather multiple tables from multiple workers each running on a separate gpu.
Each worker is working on the same table template with different data.
We collect the tables by using MPI_Gather.
First we gather the first columns of the tables from all workers, then the second columns, and so on.
On the receiving worker, a device_buffer is created to receive buffers from all workers. This buffer holds received data consecutively in the gpu memory.
So, when receiving column 0 data buffer from all workers, receiving device_buffer has the data in the following format:
<worker0 data><worker1 data>...<worker(n-1) data>
On the receiving worker, we would like to create a CuDF column for each table separately that requires a rmm::device_buffer.
Now, we are using the constructor:
However, this copies the data. This copying is not necessary in our case. Since the data is already on the gpu and nobody else will use that.
Describe the solution you'd like
We would like a device_buffer constructor that takes a gpu memory address and the data size. It will assume that the data is already there and it will not perform a copy. Basically the signature of the above constructor is fine as long as it does not copy the source data.
Describe alternatives you've considered
We are using the following constructor currently but that performs a memory copy of the data.
So we can construct a large table after MPI gather having the data of all transmitted tables. Then, we can create table_views for each received table from each worker. That should actually work.
Regarding the request for device_buffer on existing memory, I don't think we would add this as device_buffer is an owning container, not a view. For general use, what we would use here is likely some form of std::span, which we would like to add to either libcu++ or Thrust in the future.
Is your feature request related to a problem? Please describe.
We are developing a distributed dataframe processing engine using CuDF (https://github.com/cylondata/cylon).
We would like to gather multiple tables from multiple workers each running on a separate gpu.
Each worker is working on the same table template with different data.
We collect the tables by using MPI_Gather.
First we gather the first columns of the tables from all workers, then the second columns, and so on.
On the receiving worker, a device_buffer is created to receive buffers from all workers. This buffer holds received data consecutively in the gpu memory.
So, when receiving column 0 data buffer from all workers, receiving device_buffer has the data in the following format:
On the receiving worker, we would like to create a CuDF column for each table separately that requires a rmm::device_buffer.
Now, we are using the constructor:
However, this copies the data. This copying is not necessary in our case. Since the data is already on the gpu and nobody else will use that.
Describe the solution you'd like
We would like a device_buffer constructor that takes a gpu memory address and the data size. It will assume that the data is already there and it will not perform a copy. Basically the signature of the above constructor is fine as long as it does not copy the source data.
Describe alternatives you've considered
We are using the following constructor currently but that performs a memory copy of the data.
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.
The text was updated successfully, but these errors were encountered: