diff --git a/docs/how-to/unified_memory.rst b/docs/how-to/unified_memory.rst index f64189454c..67836b0acc 100644 --- a/docs/how-to/unified_memory.rst +++ b/docs/how-to/unified_memory.rst @@ -104,6 +104,62 @@ system requirements` and :ref:`checking unified memory management support`. offers an easy transition from a CPU written C++ code to a HIP code as the same system allocation API is used. +To ensure the proper functioning of unified memory features on Heterogeneous Memory Management (HMM) supported graphics cards, it is essential to configure the environment variable ``XNACK=1``. Without this configuration, the behavior will be similar to that of systems without HMM support. For more details, visit `GPU memory `_. + +The chart below illustrates the expected behavior of managed and unified memory functions in ROCm and CUDA environments, both with and without HMM support: + +.. list-table:: Comparison of expected behavior of managed and unified memory functions in ROCm + :widths: 40, 25, 25 + :header-rows: 2 + + * - call + - ROCm 5 or 6 without HMM or with ``XNACK=0`` + - ROCm 5 or 6 with HMM and with ``XNACK=1`` + * - OS support + - RHEL 8.4 or SLES 15 SP2 + - RHEL 8.7+ or SLES 15 SP4 + * - ``malloc()``, ``new``, system allocator + - host (not accessible on device) + - host, page-fault migration + * - ``hipMalloc()`` + - device, zero copy + - device, zero copy + * - ``hipMallocManaged()``, ``__managed__`` + - host, pinned, zero copy + - host, page-fault migration + * - ``hipHostRegister()`` + - undefined behavior + - host, page-fault migration + * - ``hipHostMalloc()`` + - host, pinned, zero copy + - host, pinned, zero copy + +.. list-table:: Comparison of expected behavior of managed and unified memory functions in CUDA + :widths: 40, 25, 25 + :header-rows: 2 + + * - call + - CUDA 11 or 12 without HMM + - CUDA 11 or 12 with HMM + * - OS support + - RHEL 7.9 or SLES 15 SP2 + - kernel 6.1.24+, 6.2.11+ or 6.3+, x64, CUDA 12.2+ + * - ``malloc()``, ``new``, system allocator + - host (not accessible on device) + - first touch, page-fault migration + * - ``cudaMalloc()`` + - device (not accessible on host) + - device, page-fault migration + * - ``cudaMallocManaged()``, ``__managed__`` + - host, page-fault migration + - first touch, page-fault migration + * - ``cudaHostRegister()`` + - host, page-fault migration + - host, page-fault migration + * - ``cudaMallocHost()`` + - host, pinned, zero copy + - host, pinned, zero copy + .. _checking unified memory management support: Checking unified memory management support diff --git a/include/hip/hip_runtime_api.h b/include/hip/hip_runtime_api.h index 798ba63bf0..05af329fc2 100644 --- a/include/hip/hip_runtime_api.h +++ b/include/hip/hip_runtime_api.h @@ -3523,6 +3523,8 @@ hipError_t hipExtHostAlloc(void** ptr, size_t size, unsigned int flags); * * The API returns the allocation pointer, managed by HMM, can be used further to execute kernels * on device and fetch data between the host and device as needed. + * + * If HMM is not supported, the function behaves the same as @p hipMallocHost . * * @note It is recommend to do the capability check before call this API. *