API Summary¶
-Summary of public functions and classes exposed -in ONNX Runtime.
+API¶
-
-
- -
- -
- -
- -
- +
- + +
- -
- +
OrtValue¶
-ONNX Runtime works with native Python data structures which are mapped into ONNX data formats :
+ ONNX Runtime loads and runs inference on a model in ONNX format, or ORT format (for memory and disk constrained environments). The main class InferenceSession wraps model loading and running, as well as user specified configuration. The data consumed by the model and the outputs that the model produces can be provided in a number of different ways. ONNX Runtime works with native Python data structures which are mapped into ONNX data formats:
Numpy arrays (tensors), dictionaries (maps), and a list of Numpy arrays (sequences).
The data backing these are on CPU. ONNX Runtime supports a custom data structure that supports all ONNX data formats that allows users
-to place the data backing these on a device, for example, on a CUDA supported device. This allows for
-interesting IOBinding scenarios (discussed below). In addition, ONNX Runtime supports directly
-working with OrtValue (s) while inferencing a model if provided as part of the input feed. Below is an example showing creation of an OrtValue from a Numpy array while placing its backing memory
on a CUDA device: Scenario 1: By default, ONNX Runtime always places input(s) and output(s) on CPU, which
-is not optimal if the input or output is consumed and produced on a device
+may not optimal if the input or output is consumed and produced on a device
other than CPU because it introduces data copy between CPU and the device.
-ONNX Runtime provides a feature, IO Binding, which addresses this issue by
-enabling users to specify which device to place input(s) and output(s) on.
-Here are scenarios to use this feature.API Overview¶
+Data on CPU¶
+# X is numpy array on cpu, create an OrtValue and place it on cuda device id = 0
ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(X, 'cuda', 0)
ortvalue.device_name() # 'cuda'
@@ -104,18 +107,19 @@
OrtValueres = sess.run(["Y"], {"X": ortvalue})
IOBinding¶
Data on device¶
+ONNX Runtime supports a custom data structure that supports all ONNX data formats that allows users +to place the data backing these on a device, for example, on a CUDA supported device. In ONNX Runtime, this called IOBinding.
+To use the IOBinding feature the InferenceSession.run() is replaced by InferenceSession.run_with_iobinding().
(In the following code snippets, model.onnx is the model to execute, X is the input data to feed, and Y is the output data.)
-Scenario 1:
+Scenario 2:
A graph is executed on a device other than CPU, for instance CUDA. Users can use IOBinding to put input on CUDA as the follows.
# X is numpy array on cpu
@@ -128,7 +132,7 @@ IOBindingY = io_binding.copy_outputs_to_cpu()[0]
Scenario 2:
+Scenario 3:
The input data is on a device, users directly use the input. The output data is on CPU.
# X is numpy array on cpu
X_ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(X, 'cuda', 0)
@@ -140,7 +144,7 @@ IOBindingY = io_binding.copy_outputs_to_cpu()[0]
Scenario 3:
+Scenario 4:
The input data and output data are both on a device, users directly use the input and also place output on the device.
#X is numpy array on cpu
X_ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(X, 'cuda', 0)
@@ -152,7 +156,7 @@ IOBindingsession.run_with_iobinding(io_binding)
Scenario 4:
+Scenario 5:
Users can request ONNX Runtime to allocate an output on a device. This is particularly useful for dynamic shaped outputs. Users can use the get_outputs() API to get access to the OrtValue (s) corresponding to the allocated output(s). Users can thus consume the ONNX Runtime allocated memory for the output as an OrtValue.
@@ -168,7 +172,12 @@