From 05e434226da6b48138671e10b1f0dde800305445 Mon Sep 17 00:00:00 2001 From: Istvan Kiss Date: Mon, 30 Sep 2024 21:26:29 +0200 Subject: [PATCH] Update driver API porting guide --- .wordlist.txt | 12 +- docs/how-to/hip_porting_driver_api.md | 304 --------- docs/how-to/hip_porting_driver_api.rst | 479 ++++++++++++++ .../global_defines_enums_structs_files.rst | 1 + .../driver_types.rst | 12 + docs/sphinx/_toc.yml.in | 1 + include/hip/driver_types.h | 591 ++++++++++-------- 7 files changed, 845 insertions(+), 555 deletions(-) delete mode 100644 docs/how-to/hip_porting_driver_api.md create mode 100644 docs/how-to/hip_porting_driver_api.rst create mode 100644 docs/reference/hip_runtime_api/global_defines_enums_structs_files/driver_types.rst diff --git a/.wordlist.txt b/.wordlist.txt index b3161fa928..2488764c1d 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -1,3 +1,4 @@ +.hip_fatbin ALU ALUs AmgX @@ -15,9 +16,12 @@ Builtins CAS clr coroutines +Ctx cuBLASLt cuCtx +CUDA's cuDNN +cuModule dataflow deallocate decompositions @@ -35,6 +39,7 @@ enum enums embeded extern +fatbin fatbinary foundationally frontends @@ -46,13 +51,17 @@ GPGPU GWS hardcoded HC +hcBLAS +HIP-Clang HIP's hipcc +hipCtx hipexamine hipified +hipModule +hipModuleLaunchKernel hipother HIPRTC -hcBLAS icc IILE iGPU @@ -122,6 +131,7 @@ texel texels tradeoffs templated +toolkits typedefs unintuitive UMM diff --git a/docs/how-to/hip_porting_driver_api.md b/docs/how-to/hip_porting_driver_api.md deleted file mode 100644 index 57879264a2..0000000000 --- a/docs/how-to/hip_porting_driver_api.md +++ /dev/null @@ -1,304 +0,0 @@ -# Porting CUDA driver API - -## Introduction to the CUDA Driver and Runtime APIs - -CUDA provides a separate CUDA Driver and Runtime APIs. The two APIs have significant overlap in functionality: - -* Both APIs support events, streams, memory management, memory copy, and error handling. -* Both APIs deliver similar performance. -* Driver APIs calls begin with the prefix `cu` while Runtime APIs begin with the prefix `cuda`. For example, the Driver API API contains `cuEventCreate` while the Runtime API contains `cudaEventCreate`, with similar functionality. -* The Driver API defines a different but largely overlapping error code space than the Runtime API, and uses a different coding convention. For example, Driver API defines `CUDA_ERROR_INVALID_VALUE` while the Runtime API defines `cudaErrorInvalidValue` - -The Driver API offers two additional pieces of functionality not provided by the Runtime API: `cuModule` and `cuCtx` APIs. - -### `cuModule` API - -The Module section of the Driver API provides additional control over how and when accelerator code objects are loaded. -For example, the driver API allows code objects to be loaded from files or memory pointers. -Symbols for kernels or global data can be extracted from the loaded code objects. -In contrast, the Runtime API automatically loads and (if necessary) compiles all of the kernels from an executable binary when run. -In this mode, NVCC must be used to compile kernel code so the automatic loading can function correctly. - -Both Driver and Runtime APIs define a function for launching kernels (called `cuLaunchKernel` or `cudaLaunchKernel`. -The kernel arguments and the execution configuration (grid dimensions, group dimensions, dynamic shared memory, and stream) are passed as arguments to the launch function. -The Runtime additionally provides the `<<< >>>` syntax for launching kernels, which resembles a special function call and is easier to use than explicit launch API (in particular with respect to handling of kernel arguments). -However, this syntax is not standard C++ and is available only when NVCC is used to compile the host code. - -The Module features are useful in an environment which generates the code objects directly, such as a new accelerator language front-end. -Here, NVCC is not used. Instead, the environment may have a different kernel language or different compilation flow. -Other environments have many kernels and do not want them to be all loaded automatically. -The Module functions can be used to load the generated code objects and launch kernels. -As we will see below, HIP defines a Module API which provides similar explicit control over code object management. - -### `cuCtx` API - -The Driver API defines "Context" and "Devices" as separate entities. -Contexts contain a single device, and a device can theoretically have multiple contexts. -Each context contains a set of streams and events specific to the context. -Historically contexts also defined a unique address space for the GPU, though this may no longer be the case in Unified Memory platforms (since the CPU and all the devices in the same process share a single unified address space). -The Context APIs also provide a mechanism to switch between devices, which allowed a single CPU thread to send commands to different GPUs. -HIP as well as a recent versions of CUDA Runtime provide other mechanisms to accomplish this feat - for example using streams or `cudaSetDevice`. - -The CUDA Runtime API unifies the Context API with the Device API. This simplifies the APIs and has little loss of functionality since each Context can contain a single device, and the benefits of multiple contexts has been replaced with other interfaces. -HIP provides a context API to facilitate easy porting from existing Driver codes. -In HIP, the `Ctx` functions largely provide an alternate syntax for changing the active device. - -Most new applications will prefer to use `hipSetDevice` or the stream APIs, therefore HIP has marked `hipCtx` APIs as **deprecated**. Support for these APIs may not be available in future releases. For more details on deprecated APIs please refer [HIP deprecated APIs](../reference/deprecated_api_list). - -## HIP Module and `Ctx` APIs - -Rather than present two separate APIs, HIP extends the HIP API with new APIs for Modules and `Ctx` control. - -### `hipModule` API - -Like the CUDA Driver API, the Module API provides additional control over how code is loaded, including options to load code from files or from in-memory pointers. -NVCC and HIP-Clang target different architectures and use different code object formats: NVCC is `cubin` or `ptx` files, while the HIP-Clang path is the `hsaco` format. -The external compilers which generate these code objects are responsible for generating and loading the correct code object for each platform. -Notably, there is not a fat binary format that can contain code for both NVCC and HIP-Clang platforms. The following table summarizes the formats used on each platform: - -| Format | APIs | NVCC | HIP-CLANG | -| --- | --- | --- | --- | -| Code Object | `hipModuleLoad`, `hipModuleLoadData` | `.cubin` or PTX text | `.hsaco` | -| Fat Binary | `hipModuleLoadFatBin` | `.fatbin` | `.hip_fatbin` | - -`hipcc` uses HIP-Clang or NVCC to compile host codes. Both of these may embed code objects into the final executable, and these code objects will be automatically loaded when the application starts. -The `hipModule` API can be used to load additional code objects, and in this way provides an extended capability to the automatically loaded code objects. -HIP-Clang allows both of these capabilities to be used together, if desired. Of course it is possible to create a program with no kernels and thus no automatic loading. - -### `hipCtx` API - -HIP provides a `Ctx` API as a thin layer over the existing Device functions. This `Ctx` API can be used to set the current context, or to query properties of the device associated with the context. -The current context is implicitly used by other APIs such as `hipStreamCreate`. - -### hipify translation of CUDA Driver API - -The HIPIFY tools convert CUDA Driver APIs for streams, events, modules, devices, memory management, context, profiler to the equivalent HIP driver calls. For example, `cuEventCreate` will be translated to `hipEventCreate`. -HIPIFY tools also convert error codes from the Driver namespace and coding convention to the equivalent HIP error code. Thus, HIP unifies the APIs for these common functions. - -The memory copy API requires additional explanation. The CUDA driver includes the memory direction in the name of the API (`cuMemcpyH2D`) while the CUDA driver API provides a single memory copy API with a parameter that specifies the direction and additionally supports a "default" direction where the runtime determines the direction automatically. -HIP provides APIs with both styles: for example, `hipMemcpyH2D` as well as `hipMemcpy`. -The first flavor may be faster in some cases since they avoid host overhead to detect the different memory directions. - -HIP defines a single error space, and uses camel-case for all errors (i.e. `hipErrorInvalidValue`). - -#### Address Spaces - -HIP-Clang defines a process-wide address space where the CPU and all devices allocate addresses from a single unified pool. -Thus addresses may be shared between contexts, and unlike the original CUDA definition a new context does not create a new address space for the device. - -#### Using `hipModuleLaunchKernel` - -`hipModuleLaunchKernel` is `cuLaunchKernel` in HIP world. It takes the same arguments as `cuLaunchKernel`. - -#### Additional Information - -* HIP-Clang creates a primary context when the HIP API is called. So in a pure driver API code, HIP-Clang will create a primary context while HIP/NVCC will have empty context stack. -HIP-Clang will push primary context to context stack when it is empty. This can have subtle differences on applications which mix the runtime and driver APIs. - -### `hip-clang` Implementation Notes - -#### `.hip_fatbin` - -hip-clang links device code from different translation units together. For each device target, a code object is generated. Code objects for different device targets are bundled by `clang-offload-bundler` as one fatbinary, which is embeded as a global symbol `__hip_fatbin` in the `.hip_fatbin` section of the ELF file of the executable or shared object. - -#### Initialization and Termination Functions - -hip-clang generates initialization and termination functions for each translation unit for host code compilation. The initialization functions call `__hipRegisterFatBinary` to register the fatbinary embeded in the ELF file. They also call `__hipRegisterFunction` and `__hipRegisterVar` to register kernel functions and device side global variables. The termination functions call `__hipUnregisterFatBinary`. -hip-clang emits a global variable `__hip_gpubin_handle` of void** type with linkonce linkage and inital value 0 for each host translation unit. Each initialization function checks `__hip_gpubin_handle` and register the fatbinary only if `__hip_gpubin_handle` is 0 and saves the return value of `__hip_gpubin_handle` to `__hip_gpubin_handle`. This is to guarantee that the fatbinary is only registered once. Similar check is done in the termination functions. - -#### Kernel Launching - -hip-clang supports kernel launching by CUDA `<<<>>>` syntax, hipLaunchKernelGGL. The latter one is macro which expand to CUDA `<<<>>>` syntax. - -When the executable or shared library is loaded by the dynamic linker, the initialization functions are called. In the initialization functions, when `__hipRegisterFatBinary` is called, the code objects containing all kernels are loaded; when `__hipRegisterFunction` is called, the stub functions are associated with the corresponding kernels in code objects. - -hip-clang implements two sets of kernel launching APIs. - -By default, in the host code, for the `<<<>>>` statement, hip-clang first emits call of `hipConfigureCall` to set up the threads and grids, then emits call of the stub function with the given arguments. In the stub function, `hipSetupArgument` is called for each kernel argument, then `hipLaunchByPtr` is called with a function pointer to the stub function. In `hipLaunchByPtr`, the real kernel associated with the stub function is launched. - -### NVCC Implementation Notes - -#### Interoperation between HIP and CUDA Driver - -CUDA applications may want to mix CUDA driver code with HIP code (see example below). This table shows the type equivalence to enable this interaction. - -|**HIP Type** |**CU Driver Type**|**CUDA Runtime Type**| -| ---- | ---- | ---- | -| `hipModule_t` | `CUmodule` | | -| `hipFunction_t` | `CUfunction` | | -| `hipCtx_t` | `CUcontext` | | -| `hipDevice_t` | `CUdevice` | | -| `hipStream_t` | `CUstream` | `cudaStream_t` | -| `hipEvent_t` | `CUevent` | `cudaEvent_t` | -| `hipArray` | `CUarray` | `cudaArray` | - -#### Compilation Options - -The `hipModule_t` interface does not support `cuModuleLoadDataEx` function, which is used to control PTX compilation options. -HIP-Clang does not use PTX and does not support these compilation options. -In fact, HIP-Clang code objects always contain fully compiled ISA and do not require additional compilation as a part of the load step. -The corresponding HIP function `hipModuleLoadDataEx` behaves as `hipModuleLoadData` on HIP-Clang path (compilation options are not used) and as `cuModuleLoadDataEx` on NVCC path. -For example (CUDA): - -```cpp -CUmodule module; -void *imagePtr = ...; // Somehow populate data pointer with code object - -const int numOptions = 1; -CUJit_option options[numOptions]; -void * optionValues[numOptions]; - -options[0] = CU_JIT_MAX_REGISTERS; -unsigned maxRegs = 15; -optionValues[0] = (void*)(&maxRegs); - -cuModuleLoadDataEx(module, imagePtr, numOptions, options, optionValues); - -CUfunction k; -cuModuleGetFunction(&k, module, "myKernel"); -``` - -HIP: - -```cpp -hipModule_t module; -void *imagePtr = ...; // Somehow populate data pointer with code object - -const int numOptions = 1; -hipJitOption options[numOptions]; -void * optionValues[numOptions]; - -options[0] = hipJitOptionMaxRegisters; -unsigned maxRegs = 15; -optionValues[0] = (void*)(&maxRegs); - -// hipModuleLoadData(module, imagePtr) will be called on HIP-Clang path, JIT options will not be used, and -// cupModuleLoadDataEx(module, imagePtr, numOptions, options, optionValues) will be called on NVCC path -hipModuleLoadDataEx(module, imagePtr, numOptions, options, optionValues); - -hipFunction_t k; -hipModuleGetFunction(&k, module, "myKernel"); -``` - -The below sample shows how to use `hipModuleGetFunction`. - -```cpp -#include -#include -#include -#include -#include - -#define LEN 64 -#define SIZE LEN<<2 - -#ifdef __HIP_PLATFORM_AMD__ -#define fileName "vcpy_isa.co" -#endif - -#ifdef __HIP_PLATFORM_NVIDIA__ -#define fileName "vcpy_isa.ptx" -#endif - -#define kernel_name "hello_world" - -int main(){ - float *A, *B; - hipDeviceptr_t Ad, Bd; - A = new float[LEN]; - B = new float[LEN]; - - for(uint32_t i=0;iargBuffer(2); - memcpy(&argBuffer[0], &Ad, sizeof(void*)); - memcpy(&argBuffer[1], &Bd, sizeof(void*)); - - size_t size = argBuffer.size()*sizeof(void*); - - void *config[] = { - HIP_LAUNCH_PARAM_BUFFER_POINTER, &argBuffer[0], - HIP_LAUNCH_PARAM_BUFFER_SIZE, &size, - HIP_LAUNCH_PARAM_END - }; - - hipModuleLaunchKernel(Function, 1, 1, 1, LEN, 1, 1, 0, 0, NULL, (void**)&config); - - hipMemcpyDtoH(B, Bd, SIZE); - for(uint32_t i=0;i tex; - -__global__ void tex2dKernel(hipLaunchParm lp, float* outputData, - int width, - int height) -{ - int x = blockIdx.x*blockDim.x + threadIdx.x; - int y = blockIdx.y*blockDim.y + threadIdx.y; - outputData[y*width + x] = tex2D(tex, x, y); -} - -``` - -```cpp -// Host code: - -texture tex; - -void myFunc () -{ - // ... - - textureReference* texref; - hipModuleGetTexRef(&texref, Module1, "tex"); - hipTexRefSetAddressMode(texref, 0, hipAddressModeWrap); - hipTexRefSetAddressMode(texref, 1, hipAddressModeWrap); - hipTexRefSetFilterMode(texref, hipFilterModePoint); - hipTexRefSetFlags(texref, 0); - hipTexRefSetFormat(texref, HIP_AD_FORMAT_FLOAT, 1); - hipTexRefSetArray(texref, array, HIP_TRSA_OVERRIDE_FORMAT); - - // ... -} -``` diff --git a/docs/how-to/hip_porting_driver_api.rst b/docs/how-to/hip_porting_driver_api.rst new file mode 100644 index 0000000000..ccc7f21e93 --- /dev/null +++ b/docs/how-to/hip_porting_driver_api.rst @@ -0,0 +1,479 @@ +.. meta:: + :description: This chapter presents how to port the CUDA driver API and showcases equivalent operations in HIP. + :keywords: AMD, ROCm, HIP, CUDA, driver API + +.. _porting_driver_api: + +******************************************************************************* +Porting CUDA driver API +******************************************************************************* + +NVIDIA provides separate CUDA driver and runtime APIs. The two APIs have significant overlap in functionality: + +* Both APIs support events, streams, memory management, memory copy, and error handling. +* Both APIs deliver similar performance. +* Driver API calls begin with the prefix ``cu``, while runtime API calls begin with the prefix ``cuda``. For example, the driver API contains ``cuEventCreate``, while the runtime API contains ``cudaEventCreate``, which has similar functionality. +* The driver API defines a different, but largely overlapping, error code space than the runtime API and uses a different coding convention. For example, the driver API defines ``CUDA_ERROR_INVALID_VALUE``, while the runtime API defines ``cudaErrorInvalidValue``. + +The driver API offers two additional functionalities not provided by the runtime API: ``cuModule`` and ``cuCtx`` APIs. + +cuModule API +============ + +The Module section of the driver API provides additional control over how and when accelerator code objects are loaded. For example, the driver API enables code objects to load from files or memory pointers. Symbols for kernels or global data are extracted from the loaded code objects. In contrast, the runtime API loads automatically and, if necessary, compiles all the kernels from an executable binary when it runs. In this mode, kernel code must be compiled using NVCC so that automatic loading can function correctly. + +The Module features are useful in an environment that generates the code objects directly, such as a new accelerator language front end. NVCC is not used here. Instead, the environment might have a different kernel language or compilation flow. Other environments have many kernels and don't want all of them to be loaded automatically. The Module functions load the generated code objects and launch kernels. Similar to the cuModule API, HIP defines a hipModule API that provides similar explicit control over code object management. + +cuCtx API +========= + +The driver API defines "Context" and "Devices" as separate entities. +Contexts contain a single device, and a device can theoretically have multiple contexts. +Each context contains a set of streams and events specific to the context. +Historically, contexts also defined a unique address space for the GPU. This might no longer be the case in unified memory platforms, because the CPU and all the devices in the same process share a single unified address space. +The Context APIs also provide a mechanism to switch between devices, which enables a single CPU thread to send commands to different GPUs. +HIP and recent versions of the CUDA Runtime provide other mechanisms to accomplish this feat, for example, using streams or ``cudaSetDevice``. + +The CUDA runtime API unifies the Context API with the Device API. This simplifies the APIs and has little loss of functionality. This is because each context can contain a single device, and the benefits of multiple contexts have been replaced with other interfaces. +HIP provides a Context API to facilitate easy porting from existing Driver code. +In HIP, the ``Ctx`` functions largely provide an alternate syntax for changing the active device. + +Most new applications preferentially use ``hipSetDevice`` or the stream APIs. Therefore, HIP has marked the ``hipCtx`` APIs as **deprecated**. Support for these APIs might not be available in future releases. For more details on deprecated APIs, see :doc:`../reference/deprecated_api_list`. + +HIP module and Ctx APIs +======================= + +Rather than present two separate APIs, HIP extends the HIP API with new APIs for modules and ``Ctx`` control. + +hipModule API +------------- + +Like the CUDA driver API, the Module API provides additional control over how code is loaded, including options to load code from files or from in-memory pointers. +NVCC and HIP-Clang target different architectures and use different code object formats. NVCC supports ``cubin`` or ``ptx`` files, while the HIP-Clang path uses the ``hsaco`` format. +The external compilers which generate these code objects are responsible for generating and loading the correct code object for each platform. +Notably, there is no fat binary format that can contain code for both NVCC and HIP-Clang platforms. The following table summarizes the formats used on each platform: + +.. list-table:: Module formats + :header-rows: 1 + + * - Format + - APIs + - NVCC + - HIP-CLANG + * - Code object + - ``hipModuleLoad``, ``hipModuleLoadData`` + - ``.cubin`` or PTX text + - ``.hsaco`` + * - Fat binary + - ``hipModuleLoadFatBin`` + - ``.fatbin`` + - ``.hip_fatbin`` + +``hipcc`` uses HIP-Clang or NVCC to compile host code. Both of these compilers can embed code objects into the final executable. These code objects are automatically loaded when the application starts. +The ``hipModule`` API can be used to load additional code objects. When used this way, it extends the capability of the automatically loaded code objects. +HIP-Clang enables both of these capabilities to be used together. Of course, it is possible to create a program with no kernels and no automatic loading. + +For module API reference, visit :ref:`module_management_reference`. + +hipCtx API +---------- + +HIP provides a ``Ctx`` API as a thin layer over the existing device functions. The ``Ctx`` API can be used to set the current context or to query properties of the device associated with the context. +The current context is implicitly used by other APIs, such as ``hipStreamCreate``. + +For context reference, visit :ref:`context_management_reference`. + +HIPIFY translation of CUDA driver API +===================================== + +The HIPIFY tools convert CUDA driver APIs for streams, events, modules, devices, memory management, context, and the profiler to the equivalent HIP calls. For example, ``cuEventCreate`` is translated to ``hipEventCreate``. +HIPIFY tools also convert error codes from the driver namespace and coding conventions to the equivalent HIP error code. HIP unifies the APIs for these common functions. + +The memory copy API requires additional explanation. The CUDA driver includes the memory direction in the name of the API (``cuMemcpyH2D``), while the CUDA driver API provides a single memory copy API with a parameter that specifies the direction. It also supports a "default" direction where the runtime determines the direction automatically. +HIP provides APIs with both styles, for example, ``hipMemcpyH2D`` as well as ``hipMemcpy``. +The first version might be faster in some cases because it avoids any host overhead to detect the different memory directions. + +HIP defines a single error space and uses camel case for all errors (i.e. ``hipErrorInvalidValue``). + +For further information, visit the :doc:`hipify:index`. + +Address spaces +-------------- + +HIP-Clang defines a process-wide address space where the CPU and all devices allocate addresses from a single unified pool. +This means addresses can be shared between contexts. Unlike the original CUDA implementation, a new context does not create a new address space for the device. + +Using hipModuleLaunchKernel +--------------------------- + +Both CUDA driver and runtime APIs define a function for launching kernels, called ``cuLaunchKernel`` or ``cudaLaunchKernel``. The equivalent API in HIP is ``hipModuleLaunchKernel``. +The kernel arguments and the execution configuration (grid dimensions, group dimensions, dynamic shared memory, and stream) are passed as arguments to the launch function. +The runtime API additionally provides the ``<<< >>>`` syntax for launching kernels, which resembles a special function call and is easier to use than the explicit launch API, especially when handling kernel arguments. +However, this syntax is not standard C++ and is available only when NVCC is used to compile the host code. + +Additional information +---------------------- + +HIP-Clang creates a primary context when the HIP API is called. So, in pure driver API code, HIP-Clang creates a primary context while HIP/NVCC has an empty context stack. HIP-Clang pushes the primary context to the context stack when it is empty. This can lead to subtle differences in applications which mix the runtime and driver APIs. + +HIP-Clang implementation notes +============================== + +.hip_fatbin +----------- + +HIP-Clang links device code from different translation units together. For each device target, it generates a code object. ``clang-offload-bundler`` bundles code objects for different device targets into one fat binary, which is embedded as the global symbol ``__hip_fatbin`` in the ``.hip_fatbin`` section of the ELF file of the executable or shared object. + +Initialization and termination functions +----------------------------------------- + +HIP-Clang generates initialization and termination functions for each translation unit for host code compilation. The initialization functions call ``__hipRegisterFatBinary`` to register the fat binary embedded in the ELF file. They also call ``__hipRegisterFunction`` and ``__hipRegisterVar`` to register kernel functions and device-side global variables. The termination functions call ``__hipUnregisterFatBinary``. +HIP-Clang emits a global variable ``__hip_gpubin_handle`` of type ``void**`` with ``linkonce`` linkage and an initial value of 0 for each host translation unit. Each initialization function checks ``__hip_gpubin_handle`` and registers the fat binary only if ``__hip_gpubin_handle`` is 0. It saves the return value of ``__hip_gpubin_handle`` to ``__hip_gpubin_handle``. This is to guarantee that the fat binary is only registered once. A similar check is performed in the termination functions. + +Kernel launching +---------------- + +HIP-Clang supports kernel launching using either the CUDA ``<<<>>>`` syntax, ``hipLaunchKernel``, or ``hipLaunchKernelGGL``. The last option is a macro which expands to the CUDA ``<<<>>>`` syntax by default. It can also be turned into a template by defining ``HIP_TEMPLATE_KERNEL_LAUNCH``. + +When the executable or shared library is loaded by the dynamic linker, the initialization functions are called. In the initialization functions, the code objects containing all kernels are loaded when ``__hipRegisterFatBinary`` is called. When ``__hipRegisterFunction`` is called, the stub functions are associated with the corresponding kernels in the code objects. + +HIP-Clang implements two sets of APIs for launching kernels. +By default, when HIP-Clang encounters the ``<<<>>>`` statement in the host code, it first calls ``hipConfigureCall`` to set up the threads and grids. It then calls the stub function with the given arguments. The stub function calls ``hipSetupArgument`` for each kernel argument, then calls ``hipLaunchByPtr`` with a function pointer to the stub function. In ``hipLaunchByPtr``, the real kernel associated with the stub function is launched. + +NVCC implementation notes +========================= + +Interoperation between HIP and CUDA driver +------------------------------------------ + +CUDA applications might want to mix CUDA driver code with HIP code (see the example below). This table shows the equivalence between CUDA and HIP types required to implement this interaction. + +.. list-table:: Equivalence table between HIP and CUDA types + :header-rows: 1 + + * - HIP type + - CU Driver type + - CUDA Runtime type + * - ``hipModule_t`` + - ``CUmodule`` + - + * - ``hipFunction_t`` + - ``CUfunction`` + - + * - ``hipCtx_t`` + - ``CUcontext`` + - + * - ``hipDevice_t`` + - ``CUdevice`` + - + * - ``hipStream_t`` + - ``CUstream`` + - ``cudaStream_t`` + * - ``hipEvent_t`` + - ``CUevent`` + - ``cudaEvent_t`` + * - ``hipArray`` + - ``CUarray`` + - ``cudaArray`` + +Compilation options +------------------- + +The ``hipModule_t`` interface does not support the ``cuModuleLoadDataEx`` function, which is used to control PTX compilation options. +HIP-Clang does not use PTX, so it does not support these compilation options. +In fact, HIP-Clang code objects always contain fully compiled code for a device-specific instruction set and do not require additional compilation as a part of the load step. +The corresponding HIP function ``hipModuleLoadDataEx`` behaves like ``hipModuleLoadData`` on the HIP-Clang path (where compilation options are not used) and like ``cuModuleLoadDataEx`` on the NVCC path. + +For example: + +.. tab-set:: + + .. tab-item:: HIP + + .. code-block:: cpp + + hipModule_t module; + void *imagePtr = ...; // Somehow populate data pointer with code object + + const int numOptions = 1; + hipJitOption options[numOptions]; + void *optionValues[numOptions]; + + options[0] = hipJitOptionMaxRegisters; + unsigned maxRegs = 15; + optionValues[0] = (void *)(&maxRegs); + + // hipModuleLoadData(module, imagePtr) will be called on HIP-Clang path, JIT + // options will not be used, and cupModuleLoadDataEx(module, imagePtr, + // numOptions, options, optionValues) will be called on NVCC path + hipModuleLoadDataEx(module, imagePtr, numOptions, options, optionValues); + + hipFunction_t k; + hipModuleGetFunction(&k, module, "myKernel"); + + .. tab-item:: CUDA + + .. code-block:: cpp + + CUmodule module; + void *imagePtr = ...; // Somehow populate data pointer with code object + + const int numOptions = 1; + CUJit_option options[numOptions]; + void *optionValues[numOptions]; + + options[0] = CU_JIT_MAX_REGISTERS; + unsigned maxRegs = 15; + optionValues[0] = (void *)(&maxRegs); + + cuModuleLoadDataEx(module, imagePtr, numOptions, options, optionValues); + + CUfunction k; + cuModuleGetFunction(&k, module, "myKernel"); + +The sample below shows how to use ``hipModuleGetFunction``. + +.. code-block:: cpp + + #include + #include + + #include + + int main() { + + size_t elements = 64*1024; + size_t size_bytes = elements * sizeof(float); + + std::vector A(elements), B(elements); + + // On NVIDIA platforms the driver runtime needs to be initiated + #ifdef __HIP_PLATFORM_NVIDIA__ + hipInit(0); + hipDevice_t device; + hipCtx_t context; + HIPCHECK(hipDeviceGet(&device, 0)); + HIPCHECK(hipCtxCreate(&context, 0, device)); + #endif + + // Allocate device memory + hipDeviceptr_t d_A, d_B; + HIPCHECK(hipMalloc(&d_A, size_bytes)); + HIPCHECK(hipMalloc(&d_B, size_bytes)); + + // Copy data to device + HIPCHECK(hipMemcpyHtoD(d_A, A.data(), size_bytes)); + HIPCHECK(hipMemcpyHtoD(d_B, B.data(), size_bytes)); + + // Load module + hipModule_t Module; + // For AMD the module file has to contain architecture specific object codee + // For NVIDIA the module file has to contain PTX, found in e.g. "vcpy_isa.ptx" + HIPCHECK(hipModuleLoad(&Module, "vcpy_isa.co")); + // Get kernel function from the module via its name + hipFunction_t Function; + HIPCHECK(hipModuleGetFunction(&Function, Module, "hello_world")); + + // Create buffer for kernel arguments + std::vector argBuffer{&d_A, &d_B}; + size_t arg_size_bytes = argBuffer.size() * sizeof(void*); + + // Create configuration passed to the kernel as arguments + void* config[] = {HIP_LAUNCH_PARAM_BUFFER_POINTER, argBuffer.data(), + HIP_LAUNCH_PARAM_BUFFER_SIZE, &arg_size_bytes, HIP_LAUNCH_PARAM_END}; + + int threads_per_block = 128; + int blocks = (elements + threads_per_block - 1) / threads_per_block; + + // Actually launch kernel + HIPCHECK(hipModuleLaunchKernel(Function, blocks, 1, 1, threads_per_block, 1, 1, 0, 0, NULL, config)); + + HIPCHECK(hipMemcpyDtoH(A.data(), d_A, elements)); + HIPCHECK(hipMemcpyDtoH(B.data(), d_B, elements)); + + #ifdef __HIP_PLATFORM_NVIDIA__ + HIPCHECK(hipCtxDetach(context)); + #endif + + HIPCHECK(hipFree(d_A)); + HIPCHECK(hipFree(d_B)); + + return 0; + } + +HIP module and texture Driver API +================================= + +HIP supports texture driver APIs. However, texture references must be declared within the host scope. The following code demonstrates the use of texture references for the ``__HIP_PLATFORM_AMD__`` platform. + +.. code-block:: cpp + + // Code to generate code object + + #include "hip/hip_runtime.h" + extern texture tex; + + __global__ void tex2dKernel(hipLaunchParm lp, float *outputData, int width, + int height) { + int x = blockIdx.x * blockDim.x + threadIdx.x; + int y = blockIdx.y * blockDim.y + threadIdx.y; + outputData[y * width + x] = tex2D(tex, x, y); + } + +.. code-block:: cpp + + // Host code: + + texture tex; + + void myFunc () + { + // ... + + textureReference* texref; + hipModuleGetTexRef(&texref, Module1, "tex"); + hipTexRefSetAddressMode(texref, 0, hipAddressModeWrap); + hipTexRefSetAddressMode(texref, 1, hipAddressModeWrap); + hipTexRefSetFilterMode(texref, hipFilterModePoint); + hipTexRefSetFlags(texref, 0); + hipTexRefSetFormat(texref, HIP_AD_FORMAT_FLOAT, 1); + hipTexRefSetArray(texref, array, HIP_TRSA_OVERRIDE_FORMAT); + + // ... + } + +Driver entry point access +========================= + +Starting from HIP version 6.2.0, support for Driver Entry Point Access is available when using CUDA 12.0 or newer. This feature allows developers to directly interact with the CUDA driver API, providing more control over GPU operations. + +Driver Entry Point Access provides several features: + +* Retrieving the address of a runtime function +* Requesting the default stream version on a per-thread basis +* Accessing new HIP features on older toolkits with a newer driver + +For driver entry point access reference, visit :ref:`driver_api_entry_point_reference`. + +Address retrieval +----------------- + +The ``hipGetProcAddress`` function can be used to obtain the address of a runtime function. This is demonstrated in the following example: + +.. code-block:: cpp + + #include + #include + + #include + + typedef hipError_t (*hipInit_t)(unsigned int); + + int main() { + // Initialize the HIP runtime + hipError_t res = hipInit(0); + if (res != hipSuccess) { + std::cerr << "Failed to initialize HIP runtime." << std::endl; + return 1; + } + + // Get the address of the hipInit function + hipInit_t hipInitFunc; + int hipVersion = HIP_VERSION; // Use the HIP version defined in hip_runtime_api.h + uint64_t flags = 0; // No special flags + hipDriverProcAddressQueryResult symbolStatus; + + res = hipGetProcAddress("hipInit", (void**)&hipInitFunc, hipVersion, flags, &symbolStatus); + if (res != hipSuccess) { + std::cerr << "Failed to get address of hipInit()." << std::endl; + return 1; + } + + // Call the hipInit function using the obtained address + res = hipInitFunc(0); + if (res == hipSuccess) { + std::cout << "HIP runtime initialized successfully using hipGetProcAddress()." << std::endl; + } else { + std::cerr << "Failed to initialize HIP runtime using hipGetProcAddress()." << std::endl; + } + + return 0; + } + +Per-thread default stream version request +----------------------------------------- + +HIP offers functionality similar to CUDA for managing streams on a per-thread basis. By using ``hipStreamPerThread``, each thread can independently manage its default stream, simplifying operations. The following example demonstrates how this feature enhances performance by reducing contention and improving efficiency. + +.. code-block:: cpp + + #include + + #include + + int main() { + // Initialize the HIP runtime + hipError_t res = hipInit(0); + if (res != hipSuccess) { + std::cerr << "Failed to initialize HIP runtime." << std::endl; + return 1; + } + + // Get the per-thread default stream + hipStream_t stream = hipStreamPerThread; + + // Use the stream for some operation + // For example, allocate memory on the device + void* d_ptr; + size_t size = 1024; + res = hipMalloc(&d_ptr, size); + if (res != hipSuccess) { + std::cerr << "Failed to allocate memory." << std::endl; + return 1; + } + + // Perform some operation using the stream + // For example, set memory on the device + res = hipMemsetAsync(d_ptr, 0, size, stream); + if (res != hipSuccess) { + std::cerr << "Failed to set memory." << std::endl; + return 1; + } + + // Synchronize the stream + res = hipStreamSynchronize(stream); + if (res != hipSuccess) { + std::cerr << "Failed to synchronize stream." << std::endl; + return 1; + } + + std::cout << "Operation completed successfully using per-thread default stream." << std::endl; + + // Free the allocated memory + hipFree(d_ptr); + + return 0; + } + +Accessing new HIP features with a newer driver +---------------------------------------------- + +HIP is designed to be forward compatible, allowing newer features to be utilized with older toolkits, provided a compatible driver is present. Feature support can be verified through runtime API functions and version checks. This approach ensures that applications can benefit from new features and improvements in the HIP runtime without needing to be recompiled with a newer toolkit. The function ``hipGetProcAddress`` enables dynamic querying and the use of newer functions offered by the HIP runtime, even if the application was built with an older toolkit. + +An example is provided for a hypothetical ``foo()`` function. + +.. code-block:: cpp + + // Get the address of the foo function + foo_t fooFunc; + int hipVersion = 60300000; // Use an own HIP version number (e.g. 6.3.0) + uint64_t flags = 0; // No special flags + hipDriverProcAddressQueryResult symbolStatus; + + res = hipGetProcAddress("foo", (void**)&fooFunc, hipVersion, flags, &symbolStatus); + +The HIP version number is defined as an integer: + +.. code-block:: cpp + + HIP_VERSION=HIP_VERSION_MAJOR * 10000000 + HIP_VERSION_MINOR * 100000 + HIP_VERSION_PATCH diff --git a/docs/reference/hip_runtime_api/global_defines_enums_structs_files.rst b/docs/reference/hip_runtime_api/global_defines_enums_structs_files.rst index 7123ab5d77..60236e5169 100644 --- a/docs/reference/hip_runtime_api/global_defines_enums_structs_files.rst +++ b/docs/reference/hip_runtime_api/global_defines_enums_structs_files.rst @@ -10,5 +10,6 @@ Global defines, enums, structs and files The structs, define macros, enums and files in the HIP runtime API. * :ref:`global_enum_defines_reference` +* :ref:`driver_types_reference` * :doc:`hip:doxygen/html/annotated` * :doc:`hip:doxygen/html/files` diff --git a/docs/reference/hip_runtime_api/global_defines_enums_structs_files/driver_types.rst b/docs/reference/hip_runtime_api/global_defines_enums_structs_files/driver_types.rst new file mode 100644 index 0000000000..552f344e69 --- /dev/null +++ b/docs/reference/hip_runtime_api/global_defines_enums_structs_files/driver_types.rst @@ -0,0 +1,12 @@ +.. meta:: + :description: The driver types reference page. + :keywords: AMD, ROCm, HIP, CUDA, driver types + +.. _driver_types_reference: + +******************************************************************************* +Driver types +******************************************************************************* + +.. doxygengroup:: DriverTypes + :content-only: diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index 9b942953e0..6a70b9e2ad 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -78,6 +78,7 @@ subtrees: subtrees: - entries: - file: reference/hip_runtime_api/global_defines_enums_structs_files/global_enum_and_defines + - file: reference/hip_runtime_api/global_defines_enums_structs_files/driver_types - file: doxygen/html/annotated - file: doxygen/html/files - file: reference/virtual_rocr diff --git a/include/hip/driver_types.h b/include/hip/driver_types.h index 3551f9d596..4c7bec15b1 100644 --- a/include/hip/driver_types.h +++ b/include/hip/driver_types.h @@ -1,5 +1,5 @@ /* -Copyright (c) 2015 - 2023 Advanced Micro Devices, Inc. All rights reserved. +Copyright (c) 2015 - 2024 Advanced Micro Devices, Inc. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal @@ -36,298 +36,341 @@ THE SOFTWARE. #include #endif #endif // !defined(__HIPCC_RTC__) + +/** + * @defgroup DriverTypes Driver Types + * @{ + * This section describes the driver data types. + * + */ + typedef void* hipDeviceptr_t; +/** + * HIP channel format kinds + */ typedef enum hipChannelFormatKind { - hipChannelFormatKindSigned = 0, - hipChannelFormatKindUnsigned = 1, - hipChannelFormatKindFloat = 2, - hipChannelFormatKindNone = 3 + hipChannelFormatKindSigned = 0, ///< Signed channel format + hipChannelFormatKindUnsigned = 1, ///< Unsigned channel format + hipChannelFormatKindFloat = 2, ///< Float channel format + hipChannelFormatKindNone = 3 ///< No channel format }hipChannelFormatKind; +/** + * HIP channel format descriptor + */ typedef struct hipChannelFormatDesc { int x; int y; int z; int w; - enum hipChannelFormatKind f; + enum hipChannelFormatKind f; ///< Channel format kind }hipChannelFormatDesc; +/** @brief The hipTexRefSetArray function flags parameter override format value*/ #define HIP_TRSA_OVERRIDE_FORMAT 0x01 +/** @brief The hipTexRefSetFlags function flags parameter read as integer value*/ #define HIP_TRSF_READ_AS_INTEGER 0x01 +/** @brief The hipTexRefSetFlags function flags parameter normalized coordinate value*/ #define HIP_TRSF_NORMALIZED_COORDINATES 0x02 +/** @brief The hipTexRefSetFlags function flags parameter srgb value*/ #define HIP_TRSF_SRGB 0x10 typedef struct hipArray* hipArray_t; typedef const struct hipArray* hipArray_const_t; +/** + * HIP array format + */ typedef enum hipArray_Format { - HIP_AD_FORMAT_UNSIGNED_INT8 = 0x01, - HIP_AD_FORMAT_UNSIGNED_INT16 = 0x02, - HIP_AD_FORMAT_UNSIGNED_INT32 = 0x03, - HIP_AD_FORMAT_SIGNED_INT8 = 0x08, - HIP_AD_FORMAT_SIGNED_INT16 = 0x09, - HIP_AD_FORMAT_SIGNED_INT32 = 0x0a, - HIP_AD_FORMAT_HALF = 0x10, - HIP_AD_FORMAT_FLOAT = 0x20 + HIP_AD_FORMAT_UNSIGNED_INT8 = 0x01, ///< Unsigned 8-bit array format + HIP_AD_FORMAT_UNSIGNED_INT16 = 0x02, ///< Unsigned 16-bit array format + HIP_AD_FORMAT_UNSIGNED_INT32 = 0x03, ///< Unsigned 32-bit array format + HIP_AD_FORMAT_SIGNED_INT8 = 0x08, ///< Signed 8-bit array format + HIP_AD_FORMAT_SIGNED_INT16 = 0x09, ///< Signed 16-bit array format + HIP_AD_FORMAT_SIGNED_INT32 = 0x0a, ///< Signed 32-bit array format + HIP_AD_FORMAT_HALF = 0x10, ///< Half array format + HIP_AD_FORMAT_FLOAT = 0x20 ///< Float array format }hipArray_Format; +/** + * HIP array descriptor + */ typedef struct HIP_ARRAY_DESCRIPTOR { - size_t Width; - size_t Height; - enum hipArray_Format Format; - unsigned int NumChannels; + size_t Width; ///< Width of the array + size_t Height; ///< Height of the array + enum hipArray_Format Format; ///< Format of the array + unsigned int NumChannels; ///< Number of channels of the array }HIP_ARRAY_DESCRIPTOR; + +/** + * HIP 3D array descriptor + */ typedef struct HIP_ARRAY3D_DESCRIPTOR { - size_t Width; - size_t Height; - size_t Depth; - enum hipArray_Format Format; - unsigned int NumChannels; - unsigned int Flags; + size_t Width; ///< Width of the array + size_t Height; ///< Height of the array + size_t Depth; ///< Depth of the array + enum hipArray_Format Format; ///< Format of the array + unsigned int NumChannels; ///< Number of channels of the array + unsigned int Flags; ///< Flags of the array }HIP_ARRAY3D_DESCRIPTOR; #if !defined(__HIPCC_RTC__) +/** + * HIP 2D memory copy parameters + */ typedef struct hip_Memcpy2D { - size_t srcXInBytes; - size_t srcY; - hipMemoryType srcMemoryType; - const void* srcHost; - hipDeviceptr_t srcDevice; - hipArray_t srcArray; - size_t srcPitch; - size_t dstXInBytes; - size_t dstY; - hipMemoryType dstMemoryType; - void* dstHost; - hipDeviceptr_t dstDevice; - hipArray_t dstArray; - size_t dstPitch; - size_t WidthInBytes; - size_t Height; + size_t srcXInBytes; ///< Source width in bytes + size_t srcY; ///< Source height + hipMemoryType srcMemoryType; ///< Source memory type + const void* srcHost; ///< Source pointer + hipDeviceptr_t srcDevice; ///< Source device + hipArray_t srcArray; ///< Source array + size_t srcPitch; ///< Source pitch + size_t dstXInBytes; ///< Destination width in bytes + size_t dstY; ///< Destination height + hipMemoryType dstMemoryType; ///< Destination memory type + void* dstHost; ///< Destination pointer + hipDeviceptr_t dstDevice; ///< Destination device + hipArray_t dstArray; ///< Destination array + size_t dstPitch; ///< Destination pitch + size_t WidthInBytes; ///< Width in bytes of the 2D memory copy + size_t Height; ///< Height of the 2D memory copy } hip_Memcpy2D; #endif // !defined(__HIPCC_RTC__) +/** + * HIP mipmapped array + */ typedef struct hipMipmappedArray { - void* data; - struct hipChannelFormatDesc desc; - unsigned int type; - unsigned int width; - unsigned int height; - unsigned int depth; - unsigned int min_mipmap_level; - unsigned int max_mipmap_level; - unsigned int flags; - enum hipArray_Format format; - unsigned int num_channels; + void* data; ///< Data pointer of the mipmapped array + struct hipChannelFormatDesc desc; ///< Description of the mipmapped array + unsigned int type; ///< Type of the mipmapped array + unsigned int width; ///< Width of the mipmapped array + unsigned int height; ///< Height of the mipmapped array + unsigned int depth; ///< Depth of the mipmapped array + unsigned int min_mipmap_level; ///< Minimum level of the mipmapped array + unsigned int max_mipmap_level; ///< Maximum level of the mipmapped array + unsigned int flags; ///< Flags of the mipmapped array + enum hipArray_Format format; ///< Format of the mipmapped array + unsigned int num_channels; ///< Number of channels of the mipmapped array } hipMipmappedArray; +/** + * HIP mipmapped array pointer + */ typedef struct hipMipmappedArray* hipMipmappedArray_t; typedef hipMipmappedArray_t hipmipmappedArray; typedef const struct hipMipmappedArray* hipMipmappedArray_const_t; /** - * hip resource types + * HIP resource types */ typedef enum hipResourceType { - hipResourceTypeArray = 0x00, - hipResourceTypeMipmappedArray = 0x01, - hipResourceTypeLinear = 0x02, - hipResourceTypePitch2D = 0x03 + hipResourceTypeArray = 0x00, ///< Array resource + hipResourceTypeMipmappedArray = 0x01, ///< Mipmapped array resource + hipResourceTypeLinear = 0x02, ///< Linear resource + hipResourceTypePitch2D = 0x03 ///< Pitch 2D resource }hipResourceType; typedef enum HIPresourcetype_enum { - HIP_RESOURCE_TYPE_ARRAY = 0x00, /**< Array resoure */ - HIP_RESOURCE_TYPE_MIPMAPPED_ARRAY = 0x01, /**< Mipmapped array resource */ - HIP_RESOURCE_TYPE_LINEAR = 0x02, /**< Linear resource */ - HIP_RESOURCE_TYPE_PITCH2D = 0x03 /**< Pitch 2D resource */ + HIP_RESOURCE_TYPE_ARRAY = 0x00, ///< Array resource + HIP_RESOURCE_TYPE_MIPMAPPED_ARRAY = 0x01, ///< Mipmapped array resource + HIP_RESOURCE_TYPE_LINEAR = 0x02, ///< Linear resource + HIP_RESOURCE_TYPE_PITCH2D = 0x03 ///< Pitch 2D resource } HIPresourcetype, hipResourcetype; /** - * hip address modes + * HIP texture address modes */ typedef enum HIPaddress_mode_enum { - HIP_TR_ADDRESS_MODE_WRAP = 0, - HIP_TR_ADDRESS_MODE_CLAMP = 1, - HIP_TR_ADDRESS_MODE_MIRROR = 2, - HIP_TR_ADDRESS_MODE_BORDER = 3 + HIP_TR_ADDRESS_MODE_WRAP = 0, ///< Wrap address mode + HIP_TR_ADDRESS_MODE_CLAMP = 1, ///< Clamp address mode + HIP_TR_ADDRESS_MODE_MIRROR = 2, ///< Mirror address mode + HIP_TR_ADDRESS_MODE_BORDER = 3 ///< Border address mode } HIPaddress_mode; /** - * hip filter modes + * HIP filter modes */ typedef enum HIPfilter_mode_enum { - HIP_TR_FILTER_MODE_POINT = 0, - HIP_TR_FILTER_MODE_LINEAR = 1 + HIP_TR_FILTER_MODE_POINT = 0, ///< Filter mode point + HIP_TR_FILTER_MODE_LINEAR = 1 ///< Filter mode linear } HIPfilter_mode; /** - * Texture descriptor + * HIP texture descriptor */ typedef struct HIP_TEXTURE_DESC_st { - HIPaddress_mode addressMode[3]; /**< Address modes */ - HIPfilter_mode filterMode; /**< Filter mode */ - unsigned int flags; /**< Flags */ - unsigned int maxAnisotropy; /**< Maximum anisotropy ratio */ - HIPfilter_mode mipmapFilterMode; /**< Mipmap filter mode */ - float mipmapLevelBias; /**< Mipmap level bias */ - float minMipmapLevelClamp; /**< Mipmap minimum level clamp */ - float maxMipmapLevelClamp; /**< Mipmap maximum level clamp */ - float borderColor[4]; /**< Border Color */ + HIPaddress_mode addressMode[3]; ///< Address modes + HIPfilter_mode filterMode; ///< Filter mode + unsigned int flags; ///< Flags + unsigned int maxAnisotropy; ///< Maximum anisotropy ratio + HIPfilter_mode mipmapFilterMode; ///< Mipmap filter mode + float mipmapLevelBias; ///< Mipmap level bias + float minMipmapLevelClamp; ///< Mipmap minimum level clamp + float maxMipmapLevelClamp; ///< Mipmap maximum level clamp + float borderColor[4]; ///< Border Color int reserved[12]; } HIP_TEXTURE_DESC; /** - * hip texture resource view formats + * HIP texture resource view formats */ typedef enum hipResourceViewFormat { - hipResViewFormatNone = 0x00, - hipResViewFormatUnsignedChar1 = 0x01, - hipResViewFormatUnsignedChar2 = 0x02, - hipResViewFormatUnsignedChar4 = 0x03, - hipResViewFormatSignedChar1 = 0x04, - hipResViewFormatSignedChar2 = 0x05, - hipResViewFormatSignedChar4 = 0x06, - hipResViewFormatUnsignedShort1 = 0x07, - hipResViewFormatUnsignedShort2 = 0x08, - hipResViewFormatUnsignedShort4 = 0x09, - hipResViewFormatSignedShort1 = 0x0a, - hipResViewFormatSignedShort2 = 0x0b, - hipResViewFormatSignedShort4 = 0x0c, - hipResViewFormatUnsignedInt1 = 0x0d, - hipResViewFormatUnsignedInt2 = 0x0e, - hipResViewFormatUnsignedInt4 = 0x0f, - hipResViewFormatSignedInt1 = 0x10, - hipResViewFormatSignedInt2 = 0x11, - hipResViewFormatSignedInt4 = 0x12, - hipResViewFormatHalf1 = 0x13, - hipResViewFormatHalf2 = 0x14, - hipResViewFormatHalf4 = 0x15, - hipResViewFormatFloat1 = 0x16, - hipResViewFormatFloat2 = 0x17, - hipResViewFormatFloat4 = 0x18, - hipResViewFormatUnsignedBlockCompressed1 = 0x19, - hipResViewFormatUnsignedBlockCompressed2 = 0x1a, - hipResViewFormatUnsignedBlockCompressed3 = 0x1b, - hipResViewFormatUnsignedBlockCompressed4 = 0x1c, - hipResViewFormatSignedBlockCompressed4 = 0x1d, - hipResViewFormatUnsignedBlockCompressed5 = 0x1e, - hipResViewFormatSignedBlockCompressed5 = 0x1f, - hipResViewFormatUnsignedBlockCompressed6H = 0x20, - hipResViewFormatSignedBlockCompressed6H = 0x21, - hipResViewFormatUnsignedBlockCompressed7 = 0x22 + hipResViewFormatNone = 0x00, ///< No resource view format (use underlying resource format) + hipResViewFormatUnsignedChar1 = 0x01, ///< 1 channel, unsigned 8-bit integers + hipResViewFormatUnsignedChar2 = 0x02, ///< 2 channels, unsigned 8-bit integers + hipResViewFormatUnsignedChar4 = 0x03, ///< 4 channels, unsigned 8-bit integers + hipResViewFormatSignedChar1 = 0x04, ///< 1 channel, signed 8-bit integers + hipResViewFormatSignedChar2 = 0x05, ///< 2 channels, signed 8-bit integers + hipResViewFormatSignedChar4 = 0x06, ///< 4 channels, signed 8-bit integers + hipResViewFormatUnsignedShort1 = 0x07, ///< 1 channel, unsigned 16-bit integers + hipResViewFormatUnsignedShort2 = 0x08, ///< 2 channels, unsigned 16-bit integers + hipResViewFormatUnsignedShort4 = 0x09, ///< 4 channels, unsigned 16-bit integers + hipResViewFormatSignedShort1 = 0x0a, ///< 1 channel, signed 16-bit integers + hipResViewFormatSignedShort2 = 0x0b, ///< 2 channels, signed 16-bit integers + hipResViewFormatSignedShort4 = 0x0c, ///< 4 channels, signed 16-bit integers + hipResViewFormatUnsignedInt1 = 0x0d, ///< 1 channel, unsigned 32-bit integers + hipResViewFormatUnsignedInt2 = 0x0e, ///< 2 channels, unsigned 32-bit integers + hipResViewFormatUnsignedInt4 = 0x0f, ///< 4 channels, unsigned 32-bit integers + hipResViewFormatSignedInt1 = 0x10, ///< 1 channel, signed 32-bit integers + hipResViewFormatSignedInt2 = 0x11, ///< 2 channels, signed 32-bit integers + hipResViewFormatSignedInt4 = 0x12, ///< 4 channels, signed 32-bit integers + hipResViewFormatHalf1 = 0x13, ///< 1 channel, 16-bit floating point + hipResViewFormatHalf2 = 0x14, ///< 2 channels, 16-bit floating point + hipResViewFormatHalf4 = 0x15, ///< 4 channels, 16-bit floating point + hipResViewFormatFloat1 = 0x16, ///< 1 channel, 32-bit floating point + hipResViewFormatFloat2 = 0x17, ///< 2 channels, 32-bit floating point + hipResViewFormatFloat4 = 0x18, ///< 4 channels, 32-bit floating point + hipResViewFormatUnsignedBlockCompressed1 = 0x19, ///< Block-compressed 1 + hipResViewFormatUnsignedBlockCompressed2 = 0x1a, ///< Block-compressed 2 + hipResViewFormatUnsignedBlockCompressed3 = 0x1b, ///< Block-compressed 3 + hipResViewFormatUnsignedBlockCompressed4 = 0x1c, ///< Block-compressed 4 unsigned + hipResViewFormatSignedBlockCompressed4 = 0x1d, ///< Block-compressed 4 signed + hipResViewFormatUnsignedBlockCompressed5 = 0x1e, ///< Block-compressed 5 unsigned + hipResViewFormatSignedBlockCompressed5 = 0x1f, ///< Block-compressed 5 signed + hipResViewFormatUnsignedBlockCompressed6H = 0x20, ///< Block-compressed 6 unsigned half-float + hipResViewFormatSignedBlockCompressed6H = 0x21, ///< Block-compressed 6 signed half-float + hipResViewFormatUnsignedBlockCompressed7 = 0x22 ///< Block-compressed 7 }hipResourceViewFormat; +/** + * HIP texture resource view formats + */ typedef enum HIPresourceViewFormat_enum { - HIP_RES_VIEW_FORMAT_NONE = 0x00, /**< No resource view format (use underlying resource format) */ - HIP_RES_VIEW_FORMAT_UINT_1X8 = 0x01, /**< 1 channel unsigned 8-bit integers */ - HIP_RES_VIEW_FORMAT_UINT_2X8 = 0x02, /**< 2 channel unsigned 8-bit integers */ - HIP_RES_VIEW_FORMAT_UINT_4X8 = 0x03, /**< 4 channel unsigned 8-bit integers */ - HIP_RES_VIEW_FORMAT_SINT_1X8 = 0x04, /**< 1 channel signed 8-bit integers */ - HIP_RES_VIEW_FORMAT_SINT_2X8 = 0x05, /**< 2 channel signed 8-bit integers */ - HIP_RES_VIEW_FORMAT_SINT_4X8 = 0x06, /**< 4 channel signed 8-bit integers */ - HIP_RES_VIEW_FORMAT_UINT_1X16 = 0x07, /**< 1 channel unsigned 16-bit integers */ - HIP_RES_VIEW_FORMAT_UINT_2X16 = 0x08, /**< 2 channel unsigned 16-bit integers */ - HIP_RES_VIEW_FORMAT_UINT_4X16 = 0x09, /**< 4 channel unsigned 16-bit integers */ - HIP_RES_VIEW_FORMAT_SINT_1X16 = 0x0a, /**< 1 channel signed 16-bit integers */ - HIP_RES_VIEW_FORMAT_SINT_2X16 = 0x0b, /**< 2 channel signed 16-bit integers */ - HIP_RES_VIEW_FORMAT_SINT_4X16 = 0x0c, /**< 4 channel signed 16-bit integers */ - HIP_RES_VIEW_FORMAT_UINT_1X32 = 0x0d, /**< 1 channel unsigned 32-bit integers */ - HIP_RES_VIEW_FORMAT_UINT_2X32 = 0x0e, /**< 2 channel unsigned 32-bit integers */ - HIP_RES_VIEW_FORMAT_UINT_4X32 = 0x0f, /**< 4 channel unsigned 32-bit integers */ - HIP_RES_VIEW_FORMAT_SINT_1X32 = 0x10, /**< 1 channel signed 32-bit integers */ - HIP_RES_VIEW_FORMAT_SINT_2X32 = 0x11, /**< 2 channel signed 32-bit integers */ - HIP_RES_VIEW_FORMAT_SINT_4X32 = 0x12, /**< 4 channel signed 32-bit integers */ - HIP_RES_VIEW_FORMAT_FLOAT_1X16 = 0x13, /**< 1 channel 16-bit floating point */ - HIP_RES_VIEW_FORMAT_FLOAT_2X16 = 0x14, /**< 2 channel 16-bit floating point */ - HIP_RES_VIEW_FORMAT_FLOAT_4X16 = 0x15, /**< 4 channel 16-bit floating point */ - HIP_RES_VIEW_FORMAT_FLOAT_1X32 = 0x16, /**< 1 channel 32-bit floating point */ - HIP_RES_VIEW_FORMAT_FLOAT_2X32 = 0x17, /**< 2 channel 32-bit floating point */ - HIP_RES_VIEW_FORMAT_FLOAT_4X32 = 0x18, /**< 4 channel 32-bit floating point */ - HIP_RES_VIEW_FORMAT_UNSIGNED_BC1 = 0x19, /**< Block compressed 1 */ - HIP_RES_VIEW_FORMAT_UNSIGNED_BC2 = 0x1a, /**< Block compressed 2 */ - HIP_RES_VIEW_FORMAT_UNSIGNED_BC3 = 0x1b, /**< Block compressed 3 */ - HIP_RES_VIEW_FORMAT_UNSIGNED_BC4 = 0x1c, /**< Block compressed 4 unsigned */ - HIP_RES_VIEW_FORMAT_SIGNED_BC4 = 0x1d, /**< Block compressed 4 signed */ - HIP_RES_VIEW_FORMAT_UNSIGNED_BC5 = 0x1e, /**< Block compressed 5 unsigned */ - HIP_RES_VIEW_FORMAT_SIGNED_BC5 = 0x1f, /**< Block compressed 5 signed */ - HIP_RES_VIEW_FORMAT_UNSIGNED_BC6H = 0x20, /**< Block compressed 6 unsigned half-float */ - HIP_RES_VIEW_FORMAT_SIGNED_BC6H = 0x21, /**< Block compressed 6 signed half-float */ - HIP_RES_VIEW_FORMAT_UNSIGNED_BC7 = 0x22 /**< Block compressed 7 */ + HIP_RES_VIEW_FORMAT_NONE = 0x00, ///< No resource view format (use underlying resource format) + HIP_RES_VIEW_FORMAT_UINT_1X8 = 0x01, ///< 1 channel, unsigned 8-bit integers + HIP_RES_VIEW_FORMAT_UINT_2X8 = 0x02, ///< 2 channels, unsigned 8-bit integers + HIP_RES_VIEW_FORMAT_UINT_4X8 = 0x03, ///< 4 channels, unsigned 8-bit integers + HIP_RES_VIEW_FORMAT_SINT_1X8 = 0x04, ///< 1 channel, signed 8-bit integers + HIP_RES_VIEW_FORMAT_SINT_2X8 = 0x05, ///< 2 channels, signed 8-bit integers + HIP_RES_VIEW_FORMAT_SINT_4X8 = 0x06, ///< 4 channels, signed 8-bit integers + HIP_RES_VIEW_FORMAT_UINT_1X16 = 0x07, ///< 1 channel, unsigned 16-bit integers + HIP_RES_VIEW_FORMAT_UINT_2X16 = 0x08, ///< 2 channels, unsigned 16-bit integers + HIP_RES_VIEW_FORMAT_UINT_4X16 = 0x09, ///< 4 channels, unsigned 16-bit integers + HIP_RES_VIEW_FORMAT_SINT_1X16 = 0x0a, ///< 1 channel, signed 16-bit integers + HIP_RES_VIEW_FORMAT_SINT_2X16 = 0x0b, ///< 2 channels, signed 16-bit integers + HIP_RES_VIEW_FORMAT_SINT_4X16 = 0x0c, ///< 4 channels, signed 16-bit integers + HIP_RES_VIEW_FORMAT_UINT_1X32 = 0x0d, ///< 1 channel, unsigned 32-bit integers + HIP_RES_VIEW_FORMAT_UINT_2X32 = 0x0e, ///< 2 channels, unsigned 32-bit integers + HIP_RES_VIEW_FORMAT_UINT_4X32 = 0x0f, ///< 4 channels, unsigned 32-bit integers + HIP_RES_VIEW_FORMAT_SINT_1X32 = 0x10, ///< 1 channel, signed 32-bit integers + HIP_RES_VIEW_FORMAT_SINT_2X32 = 0x11, ///< 2 channels, signed 32-bit integers + HIP_RES_VIEW_FORMAT_SINT_4X32 = 0x12, ///< 4 channels, signed 32-bit integers + HIP_RES_VIEW_FORMAT_FLOAT_1X16 = 0x13, ///< 1 channel, 16-bit floating point + HIP_RES_VIEW_FORMAT_FLOAT_2X16 = 0x14, ///< 2 channels, 16-bit floating point + HIP_RES_VIEW_FORMAT_FLOAT_4X16 = 0x15, ///< 4 channels, 16-bit floating point + HIP_RES_VIEW_FORMAT_FLOAT_1X32 = 0x16, ///< 1 channel, 32-bit floating point + HIP_RES_VIEW_FORMAT_FLOAT_2X32 = 0x17, ///< 2 channels, 32-bit floating point + HIP_RES_VIEW_FORMAT_FLOAT_4X32 = 0x18, ///< 4 channels, 32-bit floating point + HIP_RES_VIEW_FORMAT_UNSIGNED_BC1 = 0x19, ///< Block-compressed 1 + HIP_RES_VIEW_FORMAT_UNSIGNED_BC2 = 0x1a, ///< Block-compressed 2 + HIP_RES_VIEW_FORMAT_UNSIGNED_BC3 = 0x1b, ///< Block-compressed 3 + HIP_RES_VIEW_FORMAT_UNSIGNED_BC4 = 0x1c, ///< Block-compressed 4 unsigned + HIP_RES_VIEW_FORMAT_SIGNED_BC4 = 0x1d, ///< Block-compressed 4 signed + HIP_RES_VIEW_FORMAT_UNSIGNED_BC5 = 0x1e, ///< Block-compressed 5 unsigned + HIP_RES_VIEW_FORMAT_SIGNED_BC5 = 0x1f, ///< Block-compressed 5 signed + HIP_RES_VIEW_FORMAT_UNSIGNED_BC6H = 0x20, ///< Block-compressed 6 unsigned half-float + HIP_RES_VIEW_FORMAT_SIGNED_BC6H = 0x21, ///< Block-compressed 6 signed half-float + HIP_RES_VIEW_FORMAT_UNSIGNED_BC7 = 0x22 ///< Block-compressed 7 } HIPresourceViewFormat; /** * HIP resource descriptor */ typedef struct hipResourceDesc { - enum hipResourceType resType; + enum hipResourceType resType; ///< Resource type union { struct { - hipArray_t array; + hipArray_t array; ///< HIP array } array; struct { - hipMipmappedArray_t mipmap; + hipMipmappedArray_t mipmap; ///< HIP mipmapped array } mipmap; struct { - void* devPtr; - struct hipChannelFormatDesc desc; - size_t sizeInBytes; + void* devPtr; ///< Device pointer + struct hipChannelFormatDesc desc; ///< Channel format description + size_t sizeInBytes; ///< Size in bytes } linear; struct { - void* devPtr; - struct hipChannelFormatDesc desc; - size_t width; - size_t height; - size_t pitchInBytes; + void* devPtr; ///< Device pointer + struct hipChannelFormatDesc desc; ///< Channel format description + size_t width; ///< Width of the array in elements + size_t height; ///< Height of the array in elements + size_t pitchInBytes; ///< Pitch between two rows in bytes } pitch2D; } res; }hipResourceDesc; + +/** + * HIP resource view descriptor struct + */ typedef struct HIP_RESOURCE_DESC_st { - HIPresourcetype resType; /**< Resource type */ + HIPresourcetype resType; ///< Resource type union { struct { - hipArray_t hArray; /**< HIP array */ + hipArray_t hArray; ///< HIP array } array; struct { - hipMipmappedArray_t hMipmappedArray; /**< HIP mipmapped array */ + hipMipmappedArray_t hMipmappedArray; ///< HIP mipmapped array } mipmap; struct { - hipDeviceptr_t devPtr; /**< Device pointer */ - hipArray_Format format; /**< Array format */ - unsigned int numChannels; /**< Channels per array element */ - size_t sizeInBytes; /**< Size in bytes */ + hipDeviceptr_t devPtr; ///< Device pointer + hipArray_Format format; ///< Array format + unsigned int numChannels; ///< Channels per array element + size_t sizeInBytes; ///< Size in bytes } linear; struct { - hipDeviceptr_t devPtr; /**< Device pointer */ - hipArray_Format format; /**< Array format */ - unsigned int numChannels; /**< Channels per array element */ - size_t width; /**< Width of the array in elements */ - size_t height; /**< Height of the array in elements */ - size_t pitchInBytes; /**< Pitch between two rows in bytes */ + hipDeviceptr_t devPtr; ///< Device pointer + hipArray_Format format; ///< Array format + unsigned int numChannels; ///< Channels per array element + size_t width; ///< Width of the array in elements + size_t height; ///< Height of the array in elements + size_t pitchInBytes; ///< Pitch between two rows in bytes } pitch2D; struct { int reserved[32]; } reserved; } res; - unsigned int flags; /**< Flags (must be zero) */ + unsigned int flags; ///< Flags (must be zero) } HIP_RESOURCE_DESC; /** - * hip resource view descriptor + * HIP resource view descriptor */ struct hipResourceViewDesc { - enum hipResourceViewFormat format; - size_t width; - size_t height; - size_t depth; - unsigned int firstMipmapLevel; - unsigned int lastMipmapLevel; - unsigned int firstLayer; - unsigned int lastLayer; + enum hipResourceViewFormat format; ///< Resource view format + size_t width; ///< Width of the resource view + size_t height; ///< Height of the resource view + size_t depth; ///< Depth of the resource view + unsigned int firstMipmapLevel; ///< First defined mipmap level + unsigned int lastMipmapLevel; ///< Last defined mipmap level + unsigned int firstLayer; ///< First layer index + unsigned int lastLayer; ///< Last layer index }; /** * Resource view descriptor */ typedef struct HIP_RESOURCE_VIEW_DESC_st { - HIPresourceViewFormat format; /**< Resource view format */ - size_t width; /**< Width of the resource view */ - size_t height; /**< Height of the resource view */ - size_t depth; /**< Depth of the resource view */ - unsigned int firstMipmapLevel; /**< First defined mipmap level */ - unsigned int lastMipmapLevel; /**< Last defined mipmap level */ - unsigned int firstLayer; /**< First layer index */ - unsigned int lastLayer; /**< Last layer index */ + HIPresourceViewFormat format; ///< Resource view format + size_t width; ///< Width of the resource view + size_t height; ///< Height of the resource view + size_t depth; ///< Depth of the resource view + unsigned int firstMipmapLevel; ///< First defined mipmap level + unsigned int lastMipmapLevel; ///< Last defined mipmap level + unsigned int firstLayer; ///< First layer index + unsigned int lastLayer; ///< Last layer index unsigned int reserved[16]; } HIP_RESOURCE_VIEW_DESC; /** * Memory copy types - * */ #if !defined(__HIPCC_RTC__) typedef enum hipMemcpyKind { @@ -339,58 +382,83 @@ typedef enum hipMemcpyKind { ///