-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-40078: [C++] Import/Export ArrowDeviceArrayStream #40807
GH-40078: [C++] Import/Export ArrowDeviceArrayStream #40807
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm excited this will work for ChunkedArray
as well!
deb9fa3
to
37e93b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
cpp/src/arrow/record_batch.h
Outdated
/// If all of the data for this record batch is in host memory, then this | ||
/// should return null (the default impl). If the data for this batch is | ||
/// on a device, then if synchronization is needed before accessing the | ||
/// data the returned sync event will allow for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand this part. AFAIK a RecordBatch currently is agnostic the device of its buffers. So for example you can have a RecordBatch backed by buffers that live in CUDA memory, but then this method will always hardcoded return NULL, which is not correct in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, at the moment I don't have an implementation there. I put a note on the ImportDeviceRecordBatchReader
documentation if you look at bridge.h:
/// We are not yet bubbling the sync events from the buffers up to
/// the `GetSyncEvent` method of an imported RecordBatch. This will be added in a future
/// update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that longer term there would be a generic implementation here in RecordBatch that checks the sync events of its underlying buffers? Because in practice, we don't subclass RecordBatch in Arrow C++ to add CUDA support, so that method cannot be overriden
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was either that or introducing a subclass, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've propagated the event and device_type throughout the record batch and the Make
functions (defaulting it to nullptr) which should allow us to ensure this is all correct now. I'll update the documentation comments accordingly
/// \param[in] reader RecordBatchReader object to export | ||
/// \param[out] out C struct to export the stream to | ||
ARROW_EXPORT | ||
Status ExportDeviceRecordBatchReader(std::shared_ptr<RecordBatchReader> reader, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially related to my other comment, but the existing ExportDeviceArray
has a sync
keyword that the user of this API needs to provide. Is there a reason those methods don't have that?
The returned ArrowDeviceArrayStream
itself doesn't have a sync event member, but the ArrowDeviceArray
s that it will return still have that. The user shouldn't pass the sync event to set in those arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why I put the GetSyncEvent
on the RecordBatch
object since there could potentially be a different sync event for each record batch returned by the reader. Right now it currently is hardcoded to return null, but the GetSyncEvent
function is virtual, so someone could potentially have a RecordBatchReader
that returns a class which inherits from RecordBatch
that implements GetSyncEvent
to return the correct event object etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add GetSyncEvent()
when it actually does something? It seems like this is perhaps guessing at a future API that we don't know will exist yet or that we are not sure will be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a bunch of changes propagating the sync event and the device type now, so GetSyncEvent
now does something :)
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Seems to match the structure of the single array x non-stream case.
/// \brief Get the device type for record batches this reader produces | ||
/// | ||
/// default implementation is to return ARROW_DEVICE_CPU | ||
virtual DeviceAllocationType device_type() const { return DeviceAllocationType::kCPU; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be wrong if we import a RecordBatchReader from a non-CPU ArrowDeviceArrayStream? Or is that right now not yet possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of yet, I haven't implemented figuring out what type to send here. Since we aren't storing the device type at the record batch level, this would need to spin through the buffers of its columns which could potentially be from multiple devices if a user did something weird. So I'm not sure what the best solution here is. I'd rather not add a RecordBatch
level device_type member, but I don't yet know how to best handle the situation if a consumer does something bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like for now this will have to be supplied by the caller on construction? That seems like a better intermediate state than returning a value that could lead to segfaults?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is an abstract class, so there's no constructor here :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the suggestion is we should add a device_type
member to the RecordBatch
class and have it be provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've propagated the device_type throughout the readers now and to the static Make
methods for constructing the readers, so now this should be correct.
Needs some tests? |
@@ -23,6 +23,7 @@ | |||
#include <vector> | |||
|
|||
#include "arrow/compare.h" | |||
#include "arrow/device.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead add the necessary declarations to type_fwd.h
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would have to move the nested Device::SyncEvent
outside of the Device
class to do that since you can't forward declare an inner class like that. It would also require pushing the entire DeviceAllocationType
declaration to type_fwd.h
which I'm not sure if we want to do. So I don't think we can do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at least moving DeviceAllocationType
to type_fwd.h
would make sense.
As for Device::SyncEvent
, that's indeed a problem with nested classes (and a good reason to avoid them :-)). That could be fixed by moving SyncEvent
out of Device
and adding a compatibility alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at least moving DeviceAllocationType to type_fwd.h would make sense.
I've done that here #43853
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
5cf3bc4
to
9b14645
Compare
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8169d6e. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 9 possible false positives for unstable benchmarks that are known to sometimes produce them. |
int type = 0; | ||
for (const auto& buf : buffers) { | ||
if (!buf) continue; | ||
if (type == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition implies that, conversely, in non-debug mode we could immediately return when we encounter a buffer? Instead of continue looping on all buffers and children...
@@ -358,6 +369,16 @@ struct ARROW_EXPORT ArrayData { | |||
/// \see GetNullCount | |||
int64_t ComputeLogicalNullCount() const; | |||
|
|||
/// \brief Returns the device_type of the underlying buffers and children |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but we tend to use infinitives in docstring ("return", not "returns")
) ### Rationale for this change The original PRs for adding support for importing and exporting the new C Device interface (apache#36488 / apache#36489) only added support for the Arrays themselves, not for the stream structure. We should support both. ### What changes are included in this PR? Adding parallel functions for Import/Export of streams that accept `ArrowDeviceArrayStream`. ### Are these changes tested? Test writing in progress, wanted to get this up for review while I write tests. ### Are there any user-facing changes? No, only new functions have been added. * GitHub Issue: apache#40078 Lead-authored-by: Matt Topol <zotthewizard@gmail.com> Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Responding to feedback on #40807: This condition implies that, conversely, in non-debug mode we could immediately return when we encounter a buffer? Instead of continue looping on all buffers and children... _Originally posted in #40807 (comment) Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Rationale for this change
The original PRs for adding support for importing and exporting the new C Device interface (#36488 / #36489) only added support for the Arrays themselves, not for the stream structure. We should support both.
What changes are included in this PR?
Adding parallel functions for Import/Export of streams that accept
ArrowDeviceArrayStream
.Are these changes tested?
Test writing in progress, wanted to get this up for review while I write tests.
Are there any user-facing changes?
No, only new functions have been added.