You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/articles_en/openvino-workflow/running-inference/inference-devices-and-modes/hetero-execution.rst
+17-51
Original file line number
Diff line number
Diff line change
@@ -18,36 +18,29 @@ Execution via the heterogeneous mode can be divided into two independent steps:
18
18
19
19
1. Setting hardware affinity to operations (`ov::Core::query_model <https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_core.html#doxid-classov-1-1-core-1acdf8e64824fe4cf147c3b52ab32c1aab>`__ is used internally by the Hetero device).
20
20
2. Compiling a model to the Heterogeneous device assumes splitting the model to parts, compiling them on the specified devices (via `ov::device::priorities <https://docs.openvino.ai/2024/api/c_cpp_api/structov_1_1device_1_1_priorities.html>`__), and executing them in the Heterogeneous mode. The model is split to subgraphs in accordance with the affinities, where a set of connected operations with the same affinity is to be a dedicated subgraph. Each subgraph is compiled on a dedicated device and multiple `ov::CompiledModel <https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_compiled_model.html#doxid-classov-1-1-compiled-model>`__ objects are made, which are connected via automatically allocated intermediate tensors.
21
-
22
-
If you set pipeline parallel (via ``ov::hint::model_distribution_policy``), the model is split into multiple stages, and each stage is assigned to a different device. The output of one stage is fed as input to the next stage.
23
21
24
22
These two steps are not interconnected and affinities can be set in one of two ways, used separately or in combination (as described below): in the ``manual`` or the ``automatic`` mode.
25
23
26
24
Defining and Configuring the Hetero Device
27
-
##########################################
25
+
++++++++++++++++++++++++++++++++++++++++++
28
26
29
27
Following the OpenVINO™ naming convention, the Hetero execution plugin is assigned the label of ``"HETERO".`` It may be defined with no additional parameters, resulting in defaults being used, or configured further with the following setup options:
It assumes setting affinities explicitly for all operations in the model using `ov::Node::get_rt_info <https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_node.html#doxid-classov-1-1-node-1a6941c753af92828d842297b74df1c45a>`__ with the ``"affinity"`` key.
53
46
@@ -73,10 +66,7 @@ Randomly selecting operations and setting affinities may lead to decrease in mod
73
66
74
67
75
68
The Automatic Mode
76
-
++++++++++++++++++
77
-
78
-
Without pipeline parallelism
79
-
-----------------------------
69
+
--------------------
80
70
81
71
It decides automatically which operation is assigned to which device according to the support from dedicated devices (``GPU``, ``CPU``, etc.) and query model step is called implicitly by Hetero device during model compilation.
82
72
@@ -100,33 +90,9 @@ It does not take into account device peculiarities such as the inability to infe
100
90
:language: cpp
101
91
:fragment: [compile_model]
102
92
103
-
Pipeline parallelism
104
-
------------------------
105
-
106
-
The pipeline parallelism is set via ``ov::hint::model_distribution_policy``. This mode is an efficient technique to infer large models on multiple devices. The model is split into multiple stages, and each stage is assigned to a different device (``dGPU``, ``iGPU``, ``CPU``, etc.). This mode assign operations to different devices as reasonably as possible, ensuring that different stages can be executed in sequence and minimizing the amount of data transfer between different devices.
107
-
108
-
For large models which don’t fit on a single first priority device, model pipeline parallelism is employed where certain parts of the model are placed on different devices to ensure that the device has enough memory to infer these operations.
In some cases you may need to consider manually adjusting affinities which were set automatically. It usually serves minimizing the number of total subgraphs to optimize memory transfers. To do it, you need to "fix" the automatically assigned affinities like so:
132
98
@@ -155,7 +121,7 @@ Importantly, the automatic mode will not work if any operation in a model has it
155
121
`ov::Core::query_model <https://docs.openvino.ai/2024/api/c_cpp_api/classov_1_1_core.html#doxid-classov-1-1-core-1acdf8e64824fe4cf147c3b52ab32c1aab>`__ does not depend on affinities set by a user. Instead, it queries for an operation support based on device capabilities.
156
122
157
123
Configure fallback devices
158
-
##########################
124
+
++++++++++++++++++++++++++
159
125
160
126
If you want different devices in Hetero execution to have different device-specific configuration options, you can use the special helper property `ov::device::properties <https://docs.openvino.ai/2024/api/c_cpp_api/structov_1_1device_1_1_properties.html#doxid-group-ov-runtime-cpp-prop-api-1ga794d09f2bd8aad506508b2c53ef6a6fc>`__:
161
127
@@ -180,15 +146,15 @@ If you want different devices in Hetero execution to have different device-speci
180
146
In the example above, the ``GPU`` device is configured to enable profiling data and uses the default execution precision, while ``CPU`` has the configuration property to perform inference in ``fp32``.
181
147
182
148
Handling of Difficult Topologies
183
-
################################
149
+
++++++++++++++++++++++++++++++++
184
150
185
151
Some topologies are not friendly to heterogeneous execution on some devices, even to the point of being unable to execute.
186
152
For example, models having activation operations that are not supported on the primary device are split by Hetero into multiple sets of subgraphs which leads to suboptimal execution.
187
153
If transmitting data from one subgraph to another part of the model in the heterogeneous mode takes more time than under normal execution, heterogeneous execution may be unsubstantiated.
188
154
In such cases, you can define the heaviest part manually and set the affinity to avoid sending data back and forth many times during one inference.
189
155
190
156
Analyzing Performance of Heterogeneous Execution
191
-
################################################
157
+
++++++++++++++++++++++++++++++++++++++++++++++++
192
158
193
159
After enabling the ``OPENVINO_HETERO_VISUALIZE`` environment variable, you can dump GraphViz ``.dot`` files with annotations of operations per devices.
194
160
@@ -220,7 +186,7 @@ Here is an example of the output for Googlenet v1 running on HDDL (device no lon
220
186
221
187
222
188
Sample Usage
223
-
############
189
+
++++++++++++++++++++
224
190
225
191
OpenVINO™ sample programs can use the Heterogeneous execution used with the ``-d`` option:
0 commit comments