diff --git a/docs/data/understand/hipgraph/hip_graph.drawio b/docs/data/how-to/hipgraph/hip_graph.drawio similarity index 100% rename from docs/data/understand/hipgraph/hip_graph.drawio rename to docs/data/how-to/hipgraph/hip_graph.drawio diff --git a/docs/data/understand/hipgraph/hip_graph.svg b/docs/data/how-to/hipgraph/hip_graph.svg similarity index 100% rename from docs/data/understand/hipgraph/hip_graph.svg rename to docs/data/how-to/hipgraph/hip_graph.svg diff --git a/docs/data/understand/hipgraph/hip_graph_speedup.drawio b/docs/data/how-to/hipgraph/hip_graph_speedup.drawio similarity index 100% rename from docs/data/understand/hipgraph/hip_graph_speedup.drawio rename to docs/data/how-to/hipgraph/hip_graph_speedup.drawio diff --git a/docs/data/understand/hipgraph/hip_graph_speedup.svg b/docs/data/how-to/hipgraph/hip_graph_speedup.svg similarity index 100% rename from docs/data/understand/hipgraph/hip_graph_speedup.svg rename to docs/data/how-to/hipgraph/hip_graph_speedup.svg diff --git a/docs/how-to/hipgraph.rst b/docs/how-to/hipgraph.rst index f3eacb38f2..7b982815c8 100644 --- a/docs/how-to/hipgraph.rst +++ b/docs/how-to/hipgraph.rst @@ -1,16 +1,72 @@ .. meta:: - :description: This chapter describes how to use HIP graphs. + :description: This chapter describes how to use HIP graphs and highlights their use cases. :keywords: ROCm, HIP, graph, stream .. _how_to_HIP_graph: ******************************************************************************** -Using HIP graphs +HIP graphs ******************************************************************************** -This chapter explains how to create and use HIP graphs. To get a better -understanding of HIP graphs see -:ref:`the understand-chapter about HIP graphs`. +.. note:: + The HIP graph API is currently in Beta. Some features can change and might + have outstanding issues. Not all features supported by CUDA graphs are yet + supported. For a list of all currently supported functions see the + :doc:`HIP graph API documentation<../doxygen/html/group___graph>`. + +HIP graphs are an alternative way of executing tasks on a GPU that can provide +performance benefits over launching kernels using the standard +method via streams. A HIP graph is made up of nodes and edges. The nodes of a HIP graph represent +the operations performed, while the edges mark dependencies between those +operations. + +The nodes can be one of the following: + +- empty nodes +- nested graphs +- kernel launches +- host-side function calls +- HIP memory functions (copy, memset, ...) +- HIP events +- signalling or waiting on external semaphores + +.. note:: + The available node types are specified by ``hipGraphNodeType``. + +The following figure visualizes the concept of graphs, compared to using streams. + +.. figure:: ../data/understand/hipgraph/hip_graph.svg + :alt: Diagram depicting the difference between using streams to execute + kernels with dependencies, resolved by explicitly calling + hipDeviceSynchronize, or using graphs, where the edges denote the + dependencies. + +The standard method of launching kernels incurs a small overhead +for each iteration of the operation involved. For kernels that perform large +operations during an iteration this overhead is usually negligible. However +in many workloads, such as scientific simulations and AI, a kernel might perform a +small operation over a great number of iterations, and so the overhead of repeatedly +launching kernels can have a significant impact on performance. + +HIP graphs are designed to address this issue, by predefining the HIP API calls +and their dependencies with a graph, and performing most of the initialization +beforehand. Launching a graph only requires a single call, after which the +driver takes care of executing the operations within the graph. +Graphs can provide additional performance benefits, by enabling optimizations +that are only possible when knowing the dependencies between the operations. + +.. figure:: ../data/understand/hipgraph/hip_graph_speedup.svg + :alt: Diagram depicting the speed up achievable with HIP graphs compared to + HIP streams when launching many short-running kernels. + + Qualitative presentation of the execution time of many short-running kernels + when launched using HIP stream versus HIP graph. This does not include the + time needed to set up the graph. + + +******************************************************************************** +Using HIP graphs +******************************************************************************** There are two different ways of creating graphs: Capturing kernel launches from a stream, or explicitly creating graphs. The difference between the two @@ -23,6 +79,18 @@ The general flow for using HIP graphs includes the following steps. #. Use ``hipGraphLaunch`` to launch the executable graph to a stream #. After execution completes free and destroy graph resources +The first two steps are the initial setup and only need to be executed once. First +step is the definition of the operations (nodes) and the dependencies (edges) +between them. The second step is the instantiation of the graph. This takes care +of validating and initializing the graph, to reduce the overhead when executing +the graph. The third step is the execution of the graph, which takes care of +launching all the kernels and executing the operations while respecting their +dependencies and necessary synchronizations as specified. + +Because HIP graphs require some setup and initialization overhead before their +first execution, graphs only provide a benefit for workloads that require +many iterations to complete. + In both methods the ``hipGraph_t`` template for a graph is used to define the graph. In order to actually launch a graph, the template needs to be instantiated using ``hipGraphInstantiate``, which results in an actually executable graph of type ``hipGraphExec_t``. @@ -41,7 +109,7 @@ memory on the device or copying memory between the host and the device. Whether you want to pre-allocate the memory or manage it within the graph depends on the use-case. If the graph is executed in a tight loop the performance is usually better when the memory is preallocated, so that it -doesn't need to be reallocated in every iteration. +does not need to be reallocated in every iteration. The same rules as for normal memory allocations apply for memory allocated and freed by nodes, meaning that the nodes that access memory allocated in a graph diff --git a/docs/understand/hipgraph.rst b/docs/understand/hipgraph.rst deleted file mode 100644 index ce7d915179..0000000000 --- a/docs/understand/hipgraph.rst +++ /dev/null @@ -1,94 +0,0 @@ -.. meta:: - :description: This chapter provides an overview over the usage of HIP graph. - :keywords: ROCm, HIP, graph, stream - -.. _understand_HIP_graph: - -******************************************************************************** -HIP graph -******************************************************************************** - -.. note:: - The HIP graph API is currently in Beta. Some features can change and might - have outstanding issues. Not all features supported by CUDA graphs are yet - supported. For a list of all currently supported functions see the - :doc:`HIP graph API documentation<../doxygen/html/group___graph>`. - -A HIP graph is made up of nodes and edges. The nodes of a HIP graph represent -the operations performed, while the edges mark dependencies between those -operations. - -The nodes can be one of the following: - -- empty nodes -- nested graphs -- kernel launches -- host-side function calls -- HIP memory functions (copy, memset, ...) -- HIP events -- signalling or waiting on external semaphores - -.. note:: - The available node types are specified by ``hipGraphNodeType``. - -The following figure visualizes the concept of graphs, compared to using streams. - -.. figure:: ../data/understand/hipgraph/hip_graph.svg - :alt: Diagram depicting the difference between using streams to execute - kernels with dependencies, resolved by explicitly calling - hipDeviceSynchronize, or using graphs, where the edges denote the - dependencies. - -HIP graph advantages -================================================================================ - -HIP graphs are an alternative way of executing tasks on a GPU that can provide -performance benefits over launching kernels using the standard -method via streams. The standard method of launching incurs a small overhead -for each iteration of the operation involved. For kernels that perform large -operations during an iteration this overhead is usually negligible. However -in many workloads, such as scientific simulations and AI, a kernel might perform a -small operation over a great number of iterations, and so the overhead of repeatedly -launching kernels can have a significant impact on performance. - -HIP graphs are designed to address this issue, by predefining the HIP API calls -and their dependencies with a graph, and performing most of the initialization -beforehand. Launching a graph only requires a single call, after which the -driver takes care of executing the operations within the graph. -Graphs can provide additional performance benefits, by enabling optimizations -that are only possible when knowing the dependencies between the operations. - -.. figure:: ../data/understand/hipgraph/hip_graph_speedup.svg - :alt: Diagram depicting the speed up achievable with HIP graphs compared to - HIP streams when launching many short-running kernels. - - Qualitative presentation of the execution time of many short-running kernels - when launched using HIP stream versus HIP graph. This does not include the - time needed to set up the graph. - -Setting up HIP and using graphs -================================================================================ - -HIP graphs can be created by explicitly defining them, or using stream capture -to create a graph from existing code. For further information on how to use -HIP graphs see :ref:`the how-to-chapter about HIP graphs`. -For the available functions see the -:doc:`HIP graph API documentation<../doxygen/html/group___graph>`. - - Using HIP graphs to execute your work requires three steps: - -#. Defining the graph template -#. Instantiating the graph to get an executable graph from the template -#. Launching the graph - -The first two steps are the initial setup and only need to be executed once. First -step is the definition of the operations (nodes) and the dependencies (edges) -between them. The second step is the instantiation of the graph. This takes care -of validating and initializing the graph, to reduce the overhead when executing -the graph. The third step is the execution of the graph, which takes care of -launching all the kernels and executing the operations while respecting their -dependencies and necessary synchronizations as specified. - -Because HIP graphs require some set up and initialization overhead before their -first execution, the graph only provides a benefit for workloads that require -many iterations to complete. \ No newline at end of file