Skip to content

Commit 2175043

Browse files
committed
[RFC] Support DLPACK C Functions for Speed Exchange and Stream Handling
This PR adds support for three C functions to speedup DLPack exchange. As of now, DLPack exchange relies on python functions such as tensor.__dlpack__(). While they works well for common cases, the general overhead of such exchange is at the level of 0.2-0.3 us for very well optimized version, and can go up to 0.4-1 us for less optimized implementation. For a function that takes three arguments f(a, b, c), assume we run DLPack exchange for each argument, the general conversion overhead usually gets to around 1us and sometimes to 3us. While such overhead can be acceptable in many settings, in GPU applications the extra 1-3us overhead can still be significant. This PR proposes three functions for speed exchange DLPack tensors without going through python interpreter. - DLPackFromPyObject: exports a PyObject Tensor to DLManagedTensorVesioned - DLPackToPyObject: DLManagedTensorVesioned converts to a PyObject Tensor - DLPackTensorAllocator: Used to expose one package's tensor allocator to another package - This allows for example we implement libraries that allocates intermediate tensor based on the caller's specified Tensor Allocator. Our preliminary results show that these functions, when incorporated correctly via native extensions such as c/c++, can bring exchange cost to the level of 30ns - 80ns, giving us about one order of maginitude speedup. That means functions like f(a, b, c) can finish at 0.2us-0.4us level, which is close to what native cpp extension overhead do without exchange.
1 parent 3ea601b commit 2175043

File tree

1 file changed

+82
-3
lines changed

1 file changed

+82
-3
lines changed

include/dlpack/dlpack.h

Lines changed: 82 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/*!
2-
* Copyright (c) 2017 by Contributors
2+
* Copyright (c) 2017 - by Contributors
33
* \file dlpack.h
44
* \brief The common header of DLPack.
55
*/
@@ -324,7 +324,7 @@ typedef struct DLManagedTensor {
324324
*
325325
* \note This is the current standard DLPack exchange data structure.
326326
*/
327-
struct DLManagedTensorVersioned {
327+
typedef struct DLManagedTensorVersioned {
328328
/*!
329329
* \brief The API and ABI version of the current managed Tensor
330330
*/
@@ -358,7 +358,86 @@ struct DLManagedTensorVersioned {
358358
uint64_t flags;
359359
/*! \brief DLTensor which is being memory managed */
360360
DLTensor dl_tensor;
361-
};
361+
} DLManagedTensorVersioned;
362+
363+
//--------------------------------------------------------------------
364+
// DLPack C functions for speed exchange
365+
//--------------------------------------------------------------------
366+
/*
367+
* \brief A generic C-style allocator that exposes allocation of a Tensor/Array.
368+
*
369+
* Array/Tensor libraries can store this field as an int in the type of the Tensor/Array.
370+
*
371+
* mypackage.Tensor.__c_dlpack_tensor_allocator__ = MyPackageDLPackTensorAllocator
372+
*
373+
* This information can then be used to set allocators of a callee to run allocations.
374+
*
375+
* This particular function does not assume a Python environment; as a result,
376+
* the error handling mechanism is different from Python-related functions.
377+
*
378+
* \param prototype The prototype DLTensor to offer details about the device and shape.
379+
* Other field information will be ignored during allocation.
380+
* \param out The output DLManagedTensorVersioned.
381+
* \param error_ctx The context to set the error.
382+
* \param SetError The function to set the error.
383+
* \return 0 on success, -1 on failure.
384+
* The callee should call SetError(error_ctx, kind, message) to set the error kind and message.
385+
* \note Error propagation via SetError.
386+
*/
387+
typedef int (*DLPackTensorAllocator)( //
388+
DLTensor* prototype, DLManagedTensorVersioned** out, void* error_ctx, //
389+
void (*SetError)(void* error_ctx, const char* kind, const char* message) //
390+
);
391+
392+
/*!
393+
* \brief Exports a PyObject* Tensor/NDArray to a DLManagedTensorVersioned.
394+
*
395+
* This function is a C-style function pointer to quickly convert a PyObject* Tensor/NDArray
396+
* to a DLManagedTensorVersioned without going through the Python Interpreter.
397+
*
398+
* It also provides an option to query the current context stream of the device provided
399+
* by the tensor.
400+
*
401+
* Array/Tensor libraries can store this field as an int in the type of the Tensor/Array.
402+
*
403+
* mypackage.Tensor.__c_dlpack_from_pyobject__ = MyPackageDLPackFromPyObject
404+
*
405+
* This information can then be picked up by importers and libraries to run the speed conversion.
406+
* This function should not throw any exceptions; if it fails, it should return -1 and
407+
* set the error message via PyErr_SetXXX.
408+
*
409+
* \param py_object The Python object to convert; this should be PyObject*.
410+
* We use void* to avoid dependency on Python.h.
411+
* \param out The output DLManagedTensorVersioned.
412+
* \param optional_out_env_stream Outputs the current context stream of the device provided
413+
* by the tensor; it can be NULL, in which case the stream will not be queried.
414+
* \return 0 on success, -1 on failure. PyError should be set if -1 is returned.
415+
* \note We use void* to avoid dependency on Python.h, so this specific type is
416+
* not dependent on Python.h and can be copied to dlpack.h.
417+
*/
418+
typedef int (*DLPackFromPyObject)( //
419+
void* py_object, //
420+
DLManagedTensorVersioned** out, //
421+
void** optional_out_env_stream //
422+
);
423+
424+
/*!
425+
* \brief Imports a DLManagedTensorVersioned to a PyObject* Tensor/NDArray.
426+
*
427+
* This function is a C-style function pointer to quickly convert a DLManagedTensorVersioned
428+
* to a PyObject* without going through the Python Interpreter.
429+
*
430+
* Array/Tensor libraries can store this field as an int in the type of the Tensor/Array.
431+
*
432+
* mypackage.Tensor.__c_dlpack_to_pyobject__ = MyPackageDLPackToPyObject
433+
*
434+
* \param tensor The DLManagedTensorVersioned to convert.
435+
* \param out_py_object The output Python object.
436+
* \return 0 on success, -1 on failure. PyError should be set if -1 is returned.
437+
* \note We use void* to avoid dependency on Python.h, so this specific type is
438+
* not dependent on Python.h and can be copied to dlpack.h.
439+
*/
440+
typedef int (*DLPackToPyObject)(DLManagedTensorVersioned* tensor, void** out_py_object);
362441

363442
#ifdef __cplusplus
364443
} // DLPACK_EXTERN_C

0 commit comments

Comments
 (0)