Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-34971: [Format] Add non-CPU version of C Data Interface #34972

Merged
merged 23 commits into from
Jun 6, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 202 additions & 0 deletions cpp/src/arrow/c/abi.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,30 @@
// specific language governing permissions and limitations
// under the License.

/// \file abi.h Arrow C-Data Interface
///
/// The Arrow C-Data interface defines a very small, stable set
zeroshade marked this conversation as resolved.
Show resolved Hide resolved
/// of C definitions which can be easily copied into any project's
/// source code and vendored to be used for columnar data interchange
/// in the Arrow format. For non-C/C++ languages and runtimes,
/// it should be almost as easy to translate the C definitions into
/// the corresponding C FFI declarations.
///
/// Applications and libraries can therefore work with Arrow memory
/// without necessarily using the Arrow libraries or reinventing
/// the wheel. Developers can choose between tight integration
/// with the Arrow software project or minimal integration with
/// the Arrow format only.

#pragma once

#include <stdint.h>

/// \defgroup Arrow C-Data Interface
zeroshade marked this conversation as resolved.
Show resolved Hide resolved
/// Definitions for the C-Data Interface/C-Stream Interface.
///
/// @{

#ifdef __cplusplus
extern "C" {
#endif
Expand Down Expand Up @@ -65,6 +85,119 @@ struct ArrowArray {

#endif // ARROW_C_DATA_INTERFACE

#ifndef ARROW_C_DEVICE_DATA_INTERFACE
#define ARROW_C_DEVICE_DATA_INTERFACE

/// \defgroup arrow-device-types Device Types
/// These macros are compatible with the dlpack DLDeviceType values,
/// using the same value for each enum as the equivalent kDL<type>
/// from dlpack.h. This list should continue to be kept in sync with
/// the equivalent dlpack.h enum values over time to ensure
/// compatibility, rather than potentially diverging.
///
/// To ensure predictability with the ABI we use macros instead of
/// an enum so the storage type is not compiler dependent.
///
/// @{

/// \brief DeviceType for the allocated memory
typedef int32_t ArrowDeviceType;

/// \brief CPU device, same as using ArrowArray directly
#define ARROW_DEVICE_CPU 1
/// \brief CUDA GPU Device
#define ARROW_DEVICE_CUDA 2
/// \brief Pinned CUDA CPU memory by cudaMallocHost
#define ARROW_DEVICE_CUDA_HOST 3
/// \brief OpenCL Device
#define ARROW_DEVICE_OPENCL 4
/// \brief Vulkan buffer for next-gen graphics
#define ARROW_DEVICE_VULKAN 7
/// \brief Metal for Apple GPU
#define ARROW_DEVICE_METAL 8
/// \brief Verilog simulator buffer
#define ARROW_DEVICE_VPI 9
/// \brief ROCm GPUs for AMD GPUs
#define ARROW_DEVICE_ROCM 10
/// \brief Pinned ROCm CPU memory allocated by hipMallocHost
#define ARROW_DEVICE_ROCM_HOST 11
/// \brief Reserved for extension
///
/// used to quickly test extension devices, semantics
/// can differ based on the implementation
#define ARROW_DEVICE_EXT_DEV 12
/// \brief CUDA managed/unified memory allocated by cudaMallocManaged
#define ARROW_DEVICE_CUDA_MANAGED 13
/// \brief unified shared memory allocated on a oneAPI
/// non-partitioned device.
///
/// A call to the oneAPI runtime is required to determine the device
/// type, the USM allocation type, and the sycl context it is bound to.
#define ARROW_DEVICE_ONEAPI 14
/// \brief GPU support for next-gen WebGPU standard
#define ARROW_DEVICE_WEBGPU 15
/// \brief Qualcomm Hexagon DSP
#define ARROW_DEVICE_HEXAGON 16

/// @}

/// \brief Struct for passing an Arrow Array alongside
/// device memory information.
struct ArrowDeviceArray {
/// \brief the Allocated Array
zeroshade marked this conversation as resolved.
Show resolved Hide resolved
///
/// the buffers in the array (along with the buffers of any
/// children) are what is allocated on the device.
///
/// the private_data and release callback of the arrow array
/// should contain any necessary information and structures
/// related to freeing the array according to the device it
/// is allocated on, rather than having a separate release
/// callback embedded here.
struct ArrowArray array;
/// \brief The device id to identify a specific device
/// if multiple of this type are on the system.
///
/// the semantics of the id will be hardware dependant.
zeroshade marked this conversation as resolved.
Show resolved Hide resolved
int64_t device_id;
/// \brief The type of device which can access this memory.
ArrowDeviceType device_type;
/// \brief An event-like object to synchronize on if needed.
///
/// Many devices, like GPUs, are primarily asynchronous with
/// respect to CPU processing. As such in order to safely access
/// memory, it is often necessary to have an object to synchronize
/// processing on. Since different devices will use different types
zeroshade marked this conversation as resolved.
Show resolved Hide resolved
/// to specify this we use a void* that can be coerced into
/// whatever the device appropriate type is (e.g. cudaEvent_t for
/// CUDA and hipEvent_t for HIP).
///
/// If synchronization is not needed this can be null. If this is
/// non-null, then it should be used to call the appropriate sync
/// method for the device (e.g. cudaStreamWaitEvent / hipStreamWaitEvent).
///
/// Expected type to coerce this void* to depending on device type:
/// cuda: cudaEvent_t*
zeroshade marked this conversation as resolved.
Show resolved Hide resolved
/// ROCm: hipEvent_t*
/// OpenCL: cl_event*
/// Vulkan: VkEvent*
/// Metal: MTLEvent*
/// OneAPI: sycl::event*
///
void* sync_event;
Copy link

@GregoryKimball GregoryKimball May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please help me understand the void* sync_event pointer a bit better in the context of CUDA/C++? Is this a function pointer that is expected to be called with a cudaStream_t parameter provided by the application? Would there be a benefit from storing a cudaStream_t in ArrowDeviceArray (in the reserved bytes or elsewhere)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of CUDA, this would be a pointer to a cudaEvent_t. It would be the producer's responsibility to create the event and call cudaEventRecord to ensure the relevant work in the stream(s) are captured in the event. The consumer can then call cudaStreamWaitEvent which is typically a device-side more efficient stream synchronization mechanism than cudaStreamSynchronize (if they need to wait for host code they can still use cudaEventSynchronize instead).

If both sides are on the same stream, then the cudaStreamWaitEvent call should have negligible overhead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, in the context of CUDA/C++ the void* sync_event would be expected to be a cudaEvent_t* that the producer created and will trigger based on when the data is available.

A consumer would then do something like:

auto event = reinterpret_cast<cudaEvent_t*>(device_array->sync_event);
cudaStreamWaitEvent(stream, event);
// add processing for device_array data to stream to process *after* waiting on the event.

As per the previous discussions, most frameworks for cuda don't actually make their internal streams easily externally available so we aren't expecting a cudaStream_t to get passed. In the future if usage deems it necessary, we could absolutely leverage the reserved bytes to add a stream/queue pointer or something. But the initial pass here is intended to pass an event via the void* sync_event that a queue can wait on from the producer and then just operate on the data from there.

/// \brief Reserved bytes for future expansion.
///
/// As non-CPU development expands we can update this struct
/// without ABI breaking changes. This also rounds out the
/// total size of this struct to be 128 bytes (power of 2)
/// on 64-bit systems. These bytes should be zero'd out after
/// allocation in order to ensure safe evolution of the ABI in
/// the future.
int64_t reserved[3];
};

#endif // ARROW_C_DEVICE_DATA_INTERFACE

#ifndef ARROW_C_STREAM_INTERFACE
#define ARROW_C_STREAM_INTERFACE

Expand Down Expand Up @@ -106,6 +239,75 @@ struct ArrowArrayStream {

#endif // ARROW_C_STREAM_INTERFACE

#ifndef ARROW_C_DEVICE_STREAM_INTERFACE
#define ARROW_C_DEVICE_STREAM_INTERFACE

/// \brief Equivalent to ArrowArrayStream, but for ArrowDeviceArrays.
zeroshade marked this conversation as resolved.
Show resolved Hide resolved
///
/// This stream is intended to provide a stream of data on a single
/// device, if a producer wants data to be produced on multiple devices
/// then multiple streams should be provided. One per device.
struct ArrowDeviceArrayStream {
/// \brief The device that this stream produces data on.
///
/// All ArrowDeviceArrays that are produced by this
/// stream should have the same device_type as set
/// here. Including it here in the stream object is
/// a convenience to allow consumers simpler processing
/// since they can assume all arrays that result from
/// this stream to be on this device type.
ArrowDeviceType device_type;

/// \brief Callback to get the stream schema
/// (will be the same for all arrays in the stream).
///
/// If successful, the ArrowSchema must be released independantly from the stream.
zeroshade marked this conversation as resolved.
Show resolved Hide resolved
/// The schema should be accessible via CPU memory.
///
/// \param[in] self The ArrowDeviceArrayStream object itself
/// \param[out] out C struct to export the schema to
/// \return 0 if successful, an `errno`-compatible error code otherwise.
int (*get_schema)(struct ArrowDeviceArrayStream* self, struct ArrowSchema* out);

/// \brief Callback to get the next array
///
/// If there is no error and the returned array has been released, the stream
/// has ended. If successful, the ArrowArray must be released independently
/// from the stream.
///
/// \param[in] self The ArrowDeviceArrayStream object itself
/// \param[out] out C struct where to export the Array and device info
/// \return 0 if successful, an `errno`-compatible error code otherwise.
int (*get_next)(struct ArrowDeviceArrayStream* self, struct ArrowDeviceArray* out);

/// \brief Callback to get optional detailed error information.
///
/// This must only be called if the last stream operation failed
/// with a non-0 return code.
///
/// The returned pointer is only valid until the next operation on this stream
/// (including release).
///
/// \param[in] self The ArrowDeviceArrayStream object itself
/// \return pointer to a null-terminated character array describing
/// the last error, or NULL if no description is available.
const char* (*get_last_error)(struct ArrowDeviceArrayStream* self);

/// \brief Release callback: release the stream's own resources.
///
/// Note that arrays returned by `get_next` must be individually released.
///
/// \param[in] self The ArrowDeviceArrayStream object itself
void (*release)(struct ArrowDeviceArrayStream* self);

/// \brief Opaque producer-specific data
void* private_data;
};

#endif // ARROW_C_DEVICE_STREAM_INTERFACE

#ifdef __cplusplus
}
#endif

/// @}
4 changes: 4 additions & 0 deletions docs/source/format/CDataInterface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@ Examples
has format string ``+us:4,5``; its two children have names ``ints`` and
``floats``, and format strings ``i`` and ``f`` respectively.

.. _c-data-interface-struct-defs:

Structure definitions
=====================
Expand Down Expand Up @@ -531,6 +532,7 @@ parameterized extension types).
The ``ArrowArray`` structure exported from an extension array simply points
to the storage data of the extension array.

.. _c-data-interface-semantics:

Semantics
=========
Expand Down Expand Up @@ -703,6 +705,8 @@ C producer examples
Exporting a simple ``int32`` array
----------------------------------

.. _c-data-interface-export-int32-schema:

Export a non-nullable ``int32`` type with empty metadata. In this case,
all ``ArrowSchema`` members point to statically-allocated data, so the
release callback is trivial.
Expand Down
Loading