Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36488: [C++] Import/Export ArrowDeviceArray #36489

Merged
merged 16 commits into from
Jul 25, 2023

Conversation

zeroshade
Copy link
Member

@zeroshade zeroshade commented Jul 5, 2023

Rationale for this change

With the addition of the ArrowDeviceArray we should provide import/export functions just like we have for ArrowArray.

What changes are included in this PR?

Adding Import/Export functions to the bridge.h/.cc files for C Data Non-CPU data.

This requires adding a device type to buffers and memory managers to propagate through.

Are these changes tested?

Yes.

(Still need to add tests to bridge_test.cc but I wanted to get eyes on this first)

Are there any user-facing changes?

No.

@zeroshade
Copy link
Member Author

CC @kkraus14

// but you can specify CUDA_HOST as the device type to override it.
if (device_type != DeviceType::UNKNOWN) {
device_type_ = device_type;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a DCHECK be useful here to confirm that device_type_ ended-up being compatible with the memory manager?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, we'd have to add a Validate or Valid method to the memory manager to do so, to allow the memory manager to specify if the device_type is valid for it. Not sure how worthwhile that is though.

@@ -57,18 +57,25 @@ class ARROW_EXPORT Buffer {
///
/// \note The passed memory must be kept alive through some other means
Buffer(const uint8_t* data, int64_t size)
: is_mutable_(false), is_cpu_(true), data_(data), size_(size), capacity_(size) {
: is_mutable_(false), is_cpu_(true), data_(data), size_(size), capacity_(size), device_type_(DeviceType::CPU) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the is_cpu_ boolean? Is there a situation in which is_cpu_ is false and device_type_ != CPU?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_cpu_ boolean is still useful because of pinned host memory. Both CUDA_HOST and ROCM_HOST are pinned CPU memory which would have the corresponding device type (since the memory is managed by the device drivers) but is accessible to the CPU.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the pointers are accessible to the CPU, there's no guarantee that there isn't stream ordered work happening on them where it could cause a race condition to CPU code accessing the buffer. Given is_cpu_ is existing ABI, I'm wondering if it wouldn't be a safer option to have it return False in these cases where if someone wants to do CPU things against CUDA_HOST or ROCM_HOST buffers they should explicitly check the device type instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think the intent for the is_cpu_ flag was solely to know whether or not the data is accessible via CPU rather than caring about the stream ordered work synchronizations. I intend on expanding the existing classes/objects to make it easier to synchronize for stream ordered work as a separate abstraction, so I lean towards leaving it marked as true in this case, but I'm open to changing it.

@pitrou @felipeblazing @Rhonda85 what do you all think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not envisioned this issue, but I think your approach is correct: is_cpu() == true means that address() is a CPU-accessible pointer to the data. Whether or not the "expected" data is already there will depend on the producer.

And, yes, a generalized version of Buffer could have an optional sync event or something...

cpp/src/arrow/c/bridge.cc Outdated Show resolved Hide resolved
cpp/src/arrow/c/bridge.cc Outdated Show resolved Hide resolved
cpp/src/arrow/c/bridge.cc Outdated Show resolved Hide resolved
cpp/src/arrow/c/bridge.cc Outdated Show resolved Hide resolved
cpp/src/arrow/c/bridge.cc Show resolved Hide resolved
cpp/src/arrow/c/bridge.h Outdated Show resolved Hide resolved
cpp/src/arrow/c/bridge.h Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jul 6, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Jul 6, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jul 10, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jul 11, 2023
Copy link
Contributor

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I will let @kkraus14 do the final review.

Comment on lines 678 to 679
Status ValidateDeviceInfo(const ArrayData& data, DeviceType* device_type,
int64_t* device_id) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal ATM, but a DeviceInfo struct might be good have (not just here, but in general), e.g:

struct DeviceInfo {
  explicit DeviceInfo(DeviceType type = DeviceType::UNKNOWN, int64_t id = -1)
      : type(type), id(id) {}
  DeviceType type;
  int64_t id;
};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least return Result<std::pair<DeviceType, int64_t>> here...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That brings me back to my original implementation vs this tail recursion version that @felipecrv suggested haha.

This is the internal recursive call, below it is the intended function to call which does return Result<std::pair<DeviceType, int64_t>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't like passing a Result recursively, you can always build a struct with a method in it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Result isn't recursive, the recursion just passes the pointers to the device type and id

cpp/src/arrow/c/bridge.cc Outdated Show resolved Hide resolved
cpp/src/arrow/device.h Outdated Show resolved Hide resolved
cpp/src/arrow/c/bridge.h Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jul 20, 2023
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite a comprehensive set of tests. I think it's probably more exhaustive than strictly needed but that's probably ok. I'll let others weigh in too but this looks good as far as I can tell.

It might also be good to exercise the sync event but I think, if you have plans to exercise it in practice in a follow-up PR, that is probably ok too.

cpp/src/arrow/c/bridge_test.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting merge Awaiting merge awaiting changes Awaiting changes and removed awaiting change review Awaiting change review awaiting merge Awaiting merge labels Jul 20, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jul 21, 2023
@zeroshade
Copy link
Member Author

Does anyone have any further comments here? If not I'll potentially merge this at the end of the day

@zeroshade
Copy link
Member Author

Looks like there's a memory leak identified by ASAN, will merge after I fix that problem.

@zeroshade zeroshade merged commit 969f4b4 into apache:main Jul 25, 2023
@zeroshade zeroshade removed the awaiting change review Awaiting change review label Jul 25, 2023
@zeroshade zeroshade deleted the import-export-devicearray branch July 25, 2023 14:34
@@ -294,6 +310,7 @@ class ARROW_EXPORT Buffer {
const uint8_t* data_;
int64_t size_;
int64_t capacity_;
std::optional<DeviceAllocationType> device_type_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this optional, are there any cases where the information is not available?


if (buf->device_type() != *device_type) {
return Status::Invalid(
"Exporting device array with buffers on more than one device.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"with more than one device allocation type" perhaps?

for (const auto& child : data.child_data) {
RETURN_NOT_OK(ValidateDeviceInfo(*child, device_type, device_id));
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also validate data.dictionary, if any.

//////////////////////////////////////////////////////////////////////////
// C device arrays

Status ValidateDeviceInfo(const ArrayData& data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put helper functions inside the anonymous namespace?


Status ExportDeviceArray(const Array& array, RawSyncEvent sync_event,
struct ArrowDeviceArray* out, struct ArrowSchema* out_schema) {
if (sync_event.sync_event != nullptr && sync_event.release_func) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, did you test this? It seems you're missing a logical negation here, i.e. !sync_event.release_func.

@@ -258,5 +260,9 @@ Result<uintptr_t> GetDeviceAddress(const uint8_t* cpu_data,
ARROW_EXPORT
Result<uint8_t*> GetHostAddress(uintptr_t device_ptr);

ARROW_EXPORT
Result<std::shared_ptr<MemoryManager>> DefaultMemoryMapper(ArrowDeviceType device_type,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird to have this in arrow/gpu, why not under arrow/c instead?


class MyDevice : public Device {
public:
explicit MyDevice(int value) : Device(true), value_(value) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to be more explicit: Device(/*xxx=*/ true)

for (const auto& child : data.child_data) {
ARROW_ASSIGN_OR_RAISE(auto dest, ToDeviceData(mm, *child));
children.push_back(dest);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should take care of the dictionary array as well.

checker(&c_array, *batch, kMyDeviceType, 1, nullptr);
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have tests for importing device arrays.
Also tests that the sync event and its release func are taken into account.

TestPrimitive(mm, decimal(16, 4), R"(["1234.5670", null])");
TestPrimitive(mm, decimal256(16, 4), R"(["1234.5670", null])");

TestPrimitive(mm, month_day_nano_interval(), R"([[-1, 5, 20], null])");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I'm not sure it's useful to copy/paste all those tests, besides the readability burden of duplicate code.

We just want to check that primitive, nested, extension and dictionary are properly handled in general (detailed export behaviour is already tested previously).

@github-actions github-actions bot added awaiting review Awaiting review awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jul 25, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 969f4b4.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

R-JunmingChen pushed a commit to R-JunmingChen/arrow that referenced this pull request Aug 20, 2023
### Rationale for this change
With the addition of the `ArrowDeviceArray` we should provide import/export functions just like we have for `ArrowArray`.

### What changes are included in this PR?
Adding Import/Export functions to the `bridge.h/.cc` files for C Data Non-CPU data.

This requires adding a device type to buffers and memory managers to propagate through. 

### Are these changes tested?
Yes.

(Still need to add tests to `bridge_test.cc` but I wanted to get eyes on this first)

### Are there any user-facing changes?
No.

* Closes: apache#36488

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
### Rationale for this change
With the addition of the `ArrowDeviceArray` we should provide import/export functions just like we have for `ArrowArray`.

### What changes are included in this PR?
Adding Import/Export functions to the `bridge.h/.cc` files for C Data Non-CPU data.

This requires adding a device type to buffers and memory managers to propagate through. 

### Are these changes tested?
Yes.

(Still need to add tests to `bridge_test.cc` but I wanted to get eyes on this first)

### Are there any user-facing changes?
No.

* Closes: apache#36488

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
zeroshade added a commit that referenced this pull request May 21, 2024
### Rationale for this change
The original PRs for adding support for importing and exporting the new C Device interface (#36488 / #36489) only added support for the Arrays themselves, not for the stream structure. We should support both.

### What changes are included in this PR?
Adding parallel functions for Import/Export of streams that accept `ArrowDeviceArrayStream`.

### Are these changes tested?
Test writing in progress, wanted to get this up for review while I write tests.

### Are there any user-facing changes?
No, only new functions have been added.

* GitHub Issue: #40078

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
)

### Rationale for this change
The original PRs for adding support for importing and exporting the new C Device interface (apache#36488 / apache#36489) only added support for the Arrays themselves, not for the stream structure. We should support both.

### What changes are included in this PR?
Adding parallel functions for Import/Export of streams that accept `ArrowDeviceArrayStream`.

### Are these changes tested?
Test writing in progress, wanted to get this up for review while I write tests.

### Are there any user-facing changes?
No, only new functions have been added.

* GitHub Issue: apache#40078

Lead-authored-by: Matt Topol <zotthewizard@gmail.com>
Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Import/Export ArrowDeviceArray
9 participants