Skip to content

Commit 53a235e

Browse files
authored
Merge pull request #298 from leofang/more_docs
Add an doc page for interoperatibility
2 parents d4418b3 + c06270c commit 53a235e

File tree

2 files changed

+85
-0
lines changed

2 files changed

+85
-0
lines changed

cuda_core/docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ and other functionalities.
1010

1111
release.md
1212
install.md
13+
interoperability.rst
1314
api.rst
1415

1516

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
.. currentmodule:: cuda.core.experimental
2+
3+
Interoperability
4+
================
5+
6+
``cuda.core`` is designed to be interoperable with other Python GPU libraries. Below
7+
we cover a list of possible such scenarios.
8+
9+
10+
Current device/context
11+
----------------------
12+
13+
The :meth:`Device.set_current` method ensures that the calling host thread has
14+
an active CUDA context set to current. This CUDA context can be seen and accessed
15+
by other GPU libraries without any code change. For libraries built on top of
16+
the `CUDA runtime <https://docs.nvidia.com/cuda/cuda-runtime-api/index.html>`_,
17+
this is as if ``cudaSetDevice`` is called.
18+
19+
Since CUDA contexts are per-thread constructs, in a multi-threaded program each
20+
host thread should call this method.
21+
22+
Conversely, if any GPU library already sets a device (or context) to current, this
23+
method ensures that the same device/context is picked up by and shared with
24+
``cuda.core``.
25+
26+
27+
``__cuda_stream__`` protocol
28+
----------------------------
29+
30+
The :class:`~_stream.Stream` class is a vocabulary type representing CUDA streams
31+
in Python. While we encourage new Python projects to start using streams (and other
32+
CUDA types) from ``cuda.core``, we understand that there are already several projects
33+
exposing their own stream types.
34+
35+
To address this issue, we propose the ``__cuda_stream__`` protocol (currently version
36+
0) as follows: For any Python objects that are meant to be interpreted as a stream, they
37+
should add a ``__cuda_stream__`` attribute that returns a 2-tuple: The version number
38+
(``0``) and the address of ``cudaStream_t`` (both as Python `int`):
39+
40+
.. code-block:: python
41+
42+
class MyStream:
43+
44+
@property
45+
def __cuda_stream__(self):
46+
return (0, self.ptr)
47+
48+
...
49+
50+
Then such objects can be understood by ``cuda.core`` anywhere a stream-like object
51+
is needed.
52+
53+
We suggest all existing Python projects that expose a stream class to also support this
54+
protocol wherever a function takes a stream.
55+
56+
57+
Memory view utilities for CPU/GPU buffers
58+
-----------------------------------------
59+
60+
The Python community has defined protocols such as CUDA Array Interface (CAI) [1]_ and DLPack
61+
[2]_ (part of the Python array API standard [3]_) for facilitating zero-copy data exchange
62+
between two GPU projects. In particular, performance considerations prompted the protocol
63+
designs gearing toward *stream-ordered* operations so as to avoid unnecessary synchronizations.
64+
While the designs are robust, *implementing* such protocols can be tricky and often requires
65+
a few iterations to ensure correctness.
66+
67+
``cuda.core`` offers a :func:`~utils.args_viewable_as_strided_memory` decorator for
68+
extracting the metadata (such as pointer address, shape, strides, and dtype) from any
69+
Python objects supporting either CAI or DLPack and returning a :class:`~utils.StridedMemoryView` object, see the
70+
`strided_memory_view.py <https://github.com/NVIDIA/cuda-python/blob/main/cuda_core/examples/strided_memory_view.py>`_
71+
example. Alternatively, a :class:`~utils.StridedMemoryView` object can be explicitly
72+
constructed without using the decorator. This provides a *concrete implementation* to both
73+
protocols that is **array-library-agnostic**, so that all Python projects can just rely on this
74+
without either re-implementing (the consumer-side of) the protocols or tying to any particular
75+
array libraries.
76+
77+
The :attr:`~utils.StridedMemoryView.is_device_accessible` attribute can be used to check
78+
whether or not the underlying buffer can be accessed on GPU.
79+
80+
.. rubric:: Footnotes
81+
82+
.. [1] https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html
83+
.. [2] https://dmlc.github.io/dlpack/latest/python_spec.html
84+
.. [3] https://data-apis.org/array-api/latest/design_topics/data_interchange.html

0 commit comments

Comments
 (0)