-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-36488: [C++] Import/Export ArrowDeviceArray #36489
Conversation
CC @kkraus14 |
// but you can specify CUDA_HOST as the device type to override it. | ||
if (device_type != DeviceType::UNKNOWN) { | ||
device_type_ = device_type; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a DCHECK be useful here to confirm that device_type_
ended-up being compatible with the memory manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, we'd have to add a Validate
or Valid
method to the memory manager to do so, to allow the memory manager to specify if the device_type is valid for it. Not sure how worthwhile that is though.
cpp/src/arrow/buffer.h
Outdated
@@ -57,18 +57,25 @@ class ARROW_EXPORT Buffer { | |||
/// | |||
/// \note The passed memory must be kept alive through some other means | |||
Buffer(const uint8_t* data, int64_t size) | |||
: is_mutable_(false), is_cpu_(true), data_(data), size_(size), capacity_(size) { | |||
: is_mutable_(false), is_cpu_(true), data_(data), size_(size), capacity_(size), device_type_(DeviceType::CPU) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need the is_cpu_
boolean? Is there a situation in which is_cpu_
is false
and device_type_ != CPU
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_cpu_
boolean is still useful because of pinned host memory. Both CUDA_HOST
and ROCM_HOST
are pinned CPU memory which would have the corresponding device type (since the memory is managed by the device drivers) but is accessible to the CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the pointers are accessible to the CPU, there's no guarantee that there isn't stream ordered work happening on them where it could cause a race condition to CPU code accessing the buffer. Given is_cpu_
is existing ABI, I'm wondering if it wouldn't be a safer option to have it return False
in these cases where if someone wants to do CPU things against CUDA_HOST
or ROCM_HOST
buffers they should explicitly check the device type instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think the intent for the is_cpu_
flag was solely to know whether or not the data is accessible via CPU rather than caring about the stream ordered work synchronizations. I intend on expanding the existing classes/objects to make it easier to synchronize for stream ordered work as a separate abstraction, so I lean towards leaving it marked as true
in this case, but I'm open to changing it.
@pitrou @felipeblazing @Rhonda85 what do you all think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had not envisioned this issue, but I think your approach is correct: is_cpu() == true
means that address()
is a CPU-accessible pointer to the data. Whether or not the "expected" data is already there will depend on the producer.
And, yes, a generalized version of Buffer
could have an optional sync event or something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I will let @kkraus14 do the final review.
cpp/src/arrow/c/bridge.cc
Outdated
Status ValidateDeviceInfo(const ArrayData& data, DeviceType* device_type, | ||
int64_t* device_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal ATM, but a DeviceInfo
struct might be good have (not just here, but in general), e.g:
struct DeviceInfo {
explicit DeviceInfo(DeviceType type = DeviceType::UNKNOWN, int64_t id = -1)
: type(type), id(id) {}
DeviceType type;
int64_t id;
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or at least return Result<std::pair<DeviceType, int64_t>>
here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That brings me back to my original implementation vs this tail recursion version that @felipecrv suggested haha.
This is the internal recursive call, below it is the intended function to call which does return Result<std::pair<DeviceType, int64_t>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't like passing a Result
recursively, you can always build a struct with a method in it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the Result
isn't recursive, the recursion just passes the pointers to the device type and id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's quite a comprehensive set of tests. I think it's probably more exhaustive than strictly needed but that's probably ok. I'll let others weigh in too but this looks good as far as I can tell.
It might also be good to exercise the sync event but I think, if you have plans to exercise it in practice in a follow-up PR, that is probably ok too.
Does anyone have any further comments here? If not I'll potentially merge this at the end of the day |
Looks like there's a memory leak identified by ASAN, will merge after I fix that problem. |
@@ -294,6 +310,7 @@ class ARROW_EXPORT Buffer { | |||
const uint8_t* data_; | |||
int64_t size_; | |||
int64_t capacity_; | |||
std::optional<DeviceAllocationType> device_type_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this optional, are there any cases where the information is not available?
|
||
if (buf->device_type() != *device_type) { | ||
return Status::Invalid( | ||
"Exporting device array with buffers on more than one device."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"with more than one device allocation type" perhaps?
for (const auto& child : data.child_data) { | ||
RETURN_NOT_OK(ValidateDeviceInfo(*child, device_type, device_id)); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also validate data.dictionary
, if any.
////////////////////////////////////////////////////////////////////////// | ||
// C device arrays | ||
|
||
Status ValidateDeviceInfo(const ArrayData& data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put helper functions inside the anonymous namespace?
|
||
Status ExportDeviceArray(const Array& array, RawSyncEvent sync_event, | ||
struct ArrowDeviceArray* out, struct ArrowSchema* out_schema) { | ||
if (sync_event.sync_event != nullptr && sync_event.release_func) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, did you test this? It seems you're missing a logical negation here, i.e. !sync_event.release_func
.
@@ -258,5 +260,9 @@ Result<uintptr_t> GetDeviceAddress(const uint8_t* cpu_data, | |||
ARROW_EXPORT | |||
Result<uint8_t*> GetHostAddress(uintptr_t device_ptr); | |||
|
|||
ARROW_EXPORT | |||
Result<std::shared_ptr<MemoryManager>> DefaultMemoryMapper(ArrowDeviceType device_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird to have this in arrow/gpu
, why not under arrow/c
instead?
|
||
class MyDevice : public Device { | ||
public: | ||
explicit MyDevice(int value) : Device(true), value_(value) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to be more explicit: Device(/*xxx=*/ true)
for (const auto& child : data.child_data) { | ||
ARROW_ASSIGN_OR_RAISE(auto dest, ToDeviceData(mm, *child)); | ||
children.push_back(dest); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should take care of the dictionary array as well.
checker(&c_array, *batch, kMyDeviceType, 1, nullptr); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have tests for importing device arrays.
Also tests that the sync event and its release func are taken into account.
TestPrimitive(mm, decimal(16, 4), R"(["1234.5670", null])"); | ||
TestPrimitive(mm, decimal256(16, 4), R"(["1234.5670", null])"); | ||
|
||
TestPrimitive(mm, month_day_nano_interval(), R"([[-1, 5, 20], null])"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I'm not sure it's useful to copy/paste all those tests, besides the readability burden of duplicate code.
We just want to check that primitive, nested, extension and dictionary are properly handled in general (detailed export behaviour is already tested previously).
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 969f4b4. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
### Rationale for this change With the addition of the `ArrowDeviceArray` we should provide import/export functions just like we have for `ArrowArray`. ### What changes are included in this PR? Adding Import/Export functions to the `bridge.h/.cc` files for C Data Non-CPU data. This requires adding a device type to buffers and memory managers to propagate through. ### Are these changes tested? Yes. (Still need to add tests to `bridge_test.cc` but I wanted to get eyes on this first) ### Are there any user-facing changes? No. * Closes: apache#36488 Lead-authored-by: Matt Topol <zotthewizard@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
### Rationale for this change With the addition of the `ArrowDeviceArray` we should provide import/export functions just like we have for `ArrowArray`. ### What changes are included in this PR? Adding Import/Export functions to the `bridge.h/.cc` files for C Data Non-CPU data. This requires adding a device type to buffers and memory managers to propagate through. ### Are these changes tested? Yes. (Still need to add tests to `bridge_test.cc` but I wanted to get eyes on this first) ### Are there any user-facing changes? No. * Closes: apache#36488 Lead-authored-by: Matt Topol <zotthewizard@gmail.com> Co-authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
### Rationale for this change The original PRs for adding support for importing and exporting the new C Device interface (#36488 / #36489) only added support for the Arrays themselves, not for the stream structure. We should support both. ### What changes are included in this PR? Adding parallel functions for Import/Export of streams that accept `ArrowDeviceArrayStream`. ### Are these changes tested? Test writing in progress, wanted to get this up for review while I write tests. ### Are there any user-facing changes? No, only new functions have been added. * GitHub Issue: #40078 Lead-authored-by: Matt Topol <zotthewizard@gmail.com> Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
) ### Rationale for this change The original PRs for adding support for importing and exporting the new C Device interface (apache#36488 / apache#36489) only added support for the Arrays themselves, not for the stream structure. We should support both. ### What changes are included in this PR? Adding parallel functions for Import/Export of streams that accept `ArrowDeviceArrayStream`. ### Are these changes tested? Test writing in progress, wanted to get this up for review while I write tests. ### Are there any user-facing changes? No, only new functions have been added. * GitHub Issue: apache#40078 Lead-authored-by: Matt Topol <zotthewizard@gmail.com> Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Rationale for this change
With the addition of the
ArrowDeviceArray
we should provide import/export functions just like we have forArrowArray
.What changes are included in this PR?
Adding Import/Export functions to the
bridge.h/.cc
files for C Data Non-CPU data.This requires adding a device type to buffers and memory managers to propagate through.
Are these changes tested?
Yes.
(Still need to add tests to
bridge_test.cc
but I wanted to get eyes on this first)Are there any user-facing changes?
No.