|
| 1 | +.. currentmodule:: cuda.core.experimental |
| 2 | + |
| 3 | +Interoperability |
| 4 | +================ |
| 5 | + |
| 6 | +``cuda.core`` is designed to be interoperable with other Python GPU libraries. Below |
| 7 | +we cover a list of possible such scenarios. |
| 8 | + |
| 9 | + |
| 10 | +Current device/context |
| 11 | +---------------------- |
| 12 | + |
| 13 | +The :meth:`Device.set_current` method ensures that the calling host thread has |
| 14 | +an active CUDA context set to current. This CUDA context can be seen and accessed |
| 15 | +by other GPU libraries without any code change. For libraries built on top of |
| 16 | +the `CUDA runtime <https://docs.nvidia.com/cuda/cuda-runtime-api/index.html>`_, |
| 17 | +this is as if ``cudaSetDevice`` is called. |
| 18 | + |
| 19 | +Since CUDA contexts are per-thread constructs, in a multi-threaded program each |
| 20 | +host thread should call this method. |
| 21 | + |
| 22 | +Conversely, if any GPU library already sets a device (or context) to current, this |
| 23 | +method ensures that the same device/context is picked up by and shared with |
| 24 | +``cuda.core``. |
| 25 | + |
| 26 | + |
| 27 | +``__cuda_stream__`` protocol |
| 28 | +---------------------------- |
| 29 | + |
| 30 | +The :class:`~_stream.Stream` class is a vocabulary type representing CUDA streams |
| 31 | +in Python. While we encourage new Python projects to start using streams (and other |
| 32 | +CUDA types) from ``cuda.core``, we understand that there are already several projects |
| 33 | +exposing their own stream types. |
| 34 | + |
| 35 | +To address this issue, we propose the ``__cuda_stream__`` protocol (currently version |
| 36 | +0) as follows: For any Python objects that are meant to be interpreted as a stream, they |
| 37 | +should add a ``__cuda_stream__`` attribute that returns a 2-tuple: The version number |
| 38 | +(``0``) and the address of ``cudaStream_t`` (both as Python `int`): |
| 39 | + |
| 40 | +.. code-block:: python |
| 41 | +
|
| 42 | + class MyStream: |
| 43 | +
|
| 44 | + @property |
| 45 | + def __cuda_stream__(self): |
| 46 | + return (0, self.ptr) |
| 47 | +
|
| 48 | + ... |
| 49 | +
|
| 50 | +Then such objects can be understood by ``cuda.core`` anywhere a stream-like object |
| 51 | +is needed. |
| 52 | + |
| 53 | +We suggest all existing Python projects that expose a stream class to also support this |
| 54 | +protocol wherever a function takes a stream. |
| 55 | + |
| 56 | + |
| 57 | +Memory view utilities for CPU/GPU buffers |
| 58 | +----------------------------------------- |
| 59 | + |
| 60 | +The Python community has defined protocols such as CUDA Array Interface (CAI) [1]_ and DLPack |
| 61 | +[2]_ (part of the Python array API standard [3]_) for facilitating zero-copy data exchange |
| 62 | +between two GPU projects. In particular, performance considerations prompted the protocol |
| 63 | +designs gearing toward *stream-ordered* operations so as to avoid unnecessary synchronizations. |
| 64 | +While the designs are robust, *implementing* such protocols can be tricky and often requires |
| 65 | +a few iterations to ensure correctness. |
| 66 | + |
| 67 | +``cuda.core`` offers a :func:`~utils.args_viewable_as_strided_memory` decorator for |
| 68 | +extracting the metadata (such as pointer address, shape, strides, and dtype) from any |
| 69 | +Python objects supporting either CAI or DLPack and returning a :class:`~utils.StridedMemoryView` object, see the |
| 70 | +`strided_memory_view.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_core/examples/strided_memory_view.py>`_ |
| 71 | +example. Alternatively, a :class:`~utils.StridedMemoryView` object can be explicitly |
| 72 | +constructed without using the decorator. This provides a *concrete implementation* to both |
| 73 | +protocols that is **array-library-agnostic**, so that all Python projects can just rely on this |
| 74 | +without either re-implementing (the consumer-side of) the protocols or tying to any particular |
| 75 | +array libraries. |
| 76 | + |
| 77 | +The :attr:`~utils.StridedMemoryView.is_device_accessible` attribute can be used to check |
| 78 | +whether or not the underlying buffer can be accessed on GPU. |
| 79 | + |
| 80 | +.. rubric:: Footnotes |
| 81 | + |
| 82 | +.. [1] https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html |
| 83 | +.. [2] https://dmlc.github.io/dlpack/latest/python_spec.html |
| 84 | +.. [3] https://data-apis.org/array-api/latest/design_topics/data_interchange.html |
0 commit comments