diff --git a/docs/api/python/api_summary.html b/docs/api/python/api_summary.html index 81645aeb66670..bb291042b3adc 100644 --- a/docs/api/python/api_summary.html +++ b/docs/api/python/api_summary.html @@ -6,7 +6,7 @@ - API Summary — ONNX Runtime 1.11.0 documentation + API — ONNX Runtime 1.11.0 documentation @@ -38,59 +38,62 @@
-
-

API Summary

-

Summary of public functions and classes exposed -in ONNX Runtime.

+
+

API

-
-

OrtValue

-

ONNX Runtime works with native Python data structures which are mapped into ONNX data formats : +

+

API Overview

+

ONNX Runtime loads and runs inference on a model in ONNX format, or ORT format (for memory and disk constrained environments).

+

The main class InferenceSession wraps model loading and running, as well as user specified configuration.

+

The data consumed by the model and the outputs that the model produces can be provided in a number of different ways.

+
+

Data on CPU

+

ONNX Runtime works with native Python data structures which are mapped into ONNX data formats: Numpy arrays (tensors), dictionaries (maps), and a list of Numpy arrays (sequences). The data backing these are on CPU.

-

ONNX Runtime supports a custom data structure that supports all ONNX data formats that allows users -to place the data backing these on a device, for example, on a CUDA supported device. This allows for -interesting IOBinding scenarios (discussed below). In addition, ONNX Runtime supports directly -working with OrtValue (s) while inferencing a model if provided as part of the input feed.

Below is an example showing creation of an OrtValue from a Numpy array while placing its backing memory on a CUDA device:

+

Scenario 1:

# X is numpy array on cpu, create an OrtValue and place it on cuda device id = 0
 ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(X, 'cuda', 0)
 ortvalue.device_name()  # 'cuda'
@@ -104,18 +107,19 @@ 

OrtValueres = sess.run(["Y"], {"X": ortvalue})

-
-
-

IOBinding

By default, ONNX Runtime always places input(s) and output(s) on CPU, which -is not optimal if the input or output is consumed and produced on a device +may not optimal if the input or output is consumed and produced on a device other than CPU because it introduces data copy between CPU and the device. -ONNX Runtime provides a feature, IO Binding, which addresses this issue by -enabling users to specify which device to place input(s) and output(s) on. -Here are scenarios to use this feature.

+See the sections below for way to minimize data copying and maximize I/O throughput.

+
+
+

Data on device

+

ONNX Runtime supports a custom data structure that supports all ONNX data formats that allows users +to place the data backing these on a device, for example, on a CUDA supported device. In ONNX Runtime, this called IOBinding.

+

To use the IOBinding feature the InferenceSession.run() is replaced by InferenceSession.run_with_iobinding().

(In the following code snippets, model.onnx is the model to execute, X is the input data to feed, and Y is the output data.)

-

Scenario 1:

+

Scenario 2:

A graph is executed on a device other than CPU, for instance CUDA. Users can use IOBinding to put input on CUDA as the follows.

# X is numpy array on cpu
@@ -128,7 +132,7 @@ 

IOBindingY = io_binding.copy_outputs_to_cpu()[0]

-

Scenario 2:

+

Scenario 3:

The input data is on a device, users directly use the input. The output data is on CPU.

-

Scenario 3:

+

Scenario 4:

The input data and output data are both on a device, users directly use the input and also place output on the device.

-

Scenario 4:

+

Scenario 5:

Users can request ONNX Runtime to allocate an output on a device. This is particularly useful for dynamic shaped outputs. Users can use the get_outputs() API to get access to the OrtValue (s) corresponding to the allocated output(s). Users can thus consume the ONNX Runtime allocated memory for the output as an OrtValue.

@@ -168,7 +172,12 @@

IOBindingort_output = io_binding.get_outputs()[0]

-

Scenario 5:

+ +
+

Access data directly

+

In addition, ONNX Runtime supports directly working with OrtValue (s) while inferencing a model if provided as part of the input feed.

+

but you can also provide pointers to Pytorch tensor storage

+

Scenario 6:

Users can bind OrtValue (s) directly.

#X is numpy array on cpu
 #X is numpy array on cpu
@@ -181,39 +190,46 @@ 

IOBindingsession.run_with_iobinding(io_binding)

+

Scenario 7:

+

You can also bind inputs and outputs directly to a PyTorch tensor.

+
io_binding = session.io_binding()
+for input_onnx in session.get_inputs():
+    tensor: torch.Tensor = inputs[input_onnx.name]
+    tensor = tensor.contiguous()
+    if tensor.dtype in [torch.int64, torch.long]:
+        # int32 mandatory as input of bindings, int64 not supported
+        tensor = tensor.type(dtype=torch.int32).to(device)
+    io_binding.bind_input(
+        name=input_onnx.name,
+        device_type=device,
+        device_id=device_id,
+        element_type=torch_to_numpy_dtype_dict[tensor.dtype],
+        shape=tuple(tensor.shape),
+        buffer_ptr=tensor.data_ptr(),
+    )
+    inputs[input_onnx.name] = tensor
+outputs = dict()
+output_shapes = ...
+for axis_name, shape in output_shapes.items():
+    tensor = torch.empty(shape, dtype=torch.float32, device=device).contiguous()
+    outputs[axis_name] = tensor
+    binding.bind_output(
+        name=axis_name,
+        device_type=device,
+        device_id=device_id,
+        element_type=np.float32,  # hard coded output type
+        shape=tuple(shape),
+        buffer_ptr=tensor.data_ptr(),
+    )
+session.run_with_iobinding(binding)
+
+
-
-

Device

-

The package is compiled for a specific device, GPU or CPU. -The CPU implementation includes optimizations -such as MKL (Math Kernel Libary). The following function -indicates the chosen option:

-
-
-onnxruntime.get_device() str
-

Return the device used to compute the prediction (CPU, MKL, …)

-
- -
-
-

Examples and datasets

-

The package contains a few models stored in ONNX format -used in the documentation. These don’t need to be downloaded -as they are installed with the package.

-
-
-onnxruntime.datasets.get_example(name)[source]
-

Retrieves the absolute file name of an example.

-
-
-
-

Load and run a model

-

ONNX Runtime reads a model saved in ONNX format. -The main class InferenceSession wraps these functionalities -in a single place.

+
+

API Details

-

Main class

+

Main class

class onnxruntime.InferenceSession(path_or_bytes, sess_options=None, providers=None, provider_options=None, **kwargs)[source]
@@ -410,9 +426,9 @@

Main class
-

Options

+

Options

-

RunOptions

+

RunOptions

class onnxruntime.RunOptions(self: onnxruntime.capi.onnxruntime_pybind11_state.RunOptions) None
@@ -453,7 +469,7 @@

RunOptions
-

SessionOptions

+

SessionOptions

class onnxruntime.SessionOptions(self: onnxruntime.capi.onnxruntime_pybind11_state.SessionOptions) None
@@ -595,9 +611,9 @@

SessionOptions

-

Data

-
-

OrtValue

+

Data

+
+

OrtValue

class onnxruntime.OrtValue(ortvalue, numpy_obj=None)[source]
@@ -710,7 +726,7 @@

OrtValue -

SparseTensor

+

SparseTensor

class onnxruntime.SparseTensor(sparse_tensor)[source]
@@ -871,9 +887,9 @@

SparseTensor

-

Devices

-
-

IOBinding

+

Devices

+
+

IOBinding

class onnxruntime.IOBinding(session)[source]
@@ -963,7 +979,7 @@

IOBinding
-

OrtDevice

+

OrtDevice

class onnxruntime.OrtDevice(c_ort_device)[source]
@@ -974,11 +990,11 @@

OrtDevice

-

Internal classes

+

Internal classes

These classes cannot be instantiated by users but they are returned by methods or functions of this libary.

-

ModelMetadata

+

ModelMetadata

class onnxruntime.ModelMetadata
@@ -1031,7 +1047,7 @@

ModelMetadata
-

NodeArg

+

NodeArg

class onnxruntime.NodeArg
@@ -1061,7 +1077,7 @@

NodeArg -

Backend

+

Backend

In addition to the regular API which is optimized for performance and usability, ONNX Runtime also implements the ONNX backend API @@ -1160,7 +1176,7 @@

ONNX Runtime

Navigation

@@ -1199,7 +1215,7 @@

Quick search

©2018-2021, Microsoft. | - Powered by Sphinx 4.3.2 + Powered by Sphinx 4.4.0 & Alabaster 0.7.12 | diff --git a/docs/api/python/auto_examples/index.html b/docs/api/python/auto_examples/index.html index 88c529d5e42e1..b5d62e2765f00 100644 --- a/docs/api/python/auto_examples/index.html +++ b/docs/api/python/auto_examples/index.html @@ -21,7 +21,7 @@ - + @@ -139,7 +139,7 @@

ONNX Runtime

Navigation

@@ -147,7 +147,7 @@

Navigation

Related Topics

@@ -178,7 +178,7 @@

Quick search

©2018-2021, Microsoft. | - Powered by Sphinx 4.3.2 + Powered by Sphinx 4.4.0 & Alabaster 0.7.12 | diff --git a/docs/api/python/auto_examples/plot_backend.html b/docs/api/python/auto_examples/plot_backend.html index bd45265c05b41..d1ee5d0e5f72c 100644 --- a/docs/api/python/auto_examples/plot_backend.html +++ b/docs/api/python/auto_examples/plot_backend.html @@ -102,7 +102,7 @@

The backend API is implemented by other frameworks and makes it easier to switch between multiple runtimes with the same API.

-

Total running time of the script: ( 0 minutes 0.014 seconds)

+

Total running time of the script: ( 0 minutes 0.017 seconds)

-

Total running time of the script: ( 0 minutes 0.009 seconds)

+

Total running time of the script: ( 0 minutes 0.010 seconds)

Out:

-
0.9209977605173356
+
0.915704460659449
 
@@ -193,12 +193,12 @@

Conversion to ONNX formatOut:

-
0.9999999999999366
+
0.9999999999999281
 

Very similar. ONNX Runtime uses floats instead of doubles, that explains the small discrepencies.

-

Total running time of the script: ( 0 minutes 0.966 seconds)

+

Total running time of the script: ( 0 minutes 1.141 seconds)

Out:

-
[array([[[0.6423601 , 0.65232253, 0.6620137 , 0.708999  , 0.65169865],
-        [0.548968  , 0.59544575, 0.7161434 , 0.525905  , 0.7210646 ],
-        [0.5178277 , 0.5842683 , 0.5627599 , 0.6324704 , 0.5833795 ],
-        [0.69634616, 0.60848683, 0.6746977 , 0.50677085, 0.5549751 ]],
-
-       [[0.5097179 , 0.59407187, 0.56360227, 0.7223234 , 0.5392329 ],
-        [0.5398089 , 0.5622808 , 0.5369593 , 0.5819309 , 0.5735331 ],
-        [0.5688669 , 0.71247685, 0.63964766, 0.63349843, 0.63380575],
-        [0.64378905, 0.60552883, 0.5184905 , 0.6312441 , 0.5047166 ]],
-
-       [[0.63900065, 0.6108959 , 0.5249817 , 0.5055595 , 0.55390376],
-        [0.62443805, 0.550723  , 0.5320551 , 0.5522731 , 0.68858314],
-        [0.69650024, 0.54673976, 0.56964463, 0.58536506, 0.5743989 ],
-        [0.6382853 , 0.5826889 , 0.53635114, 0.52279866, 0.7300966 ]]],
+
[array([[[0.6613654 , 0.7164093 , 0.5314087 , 0.7067613 , 0.70412177],
+        [0.7080177 , 0.54169816, 0.6488505 , 0.5935635 , 0.7268913 ],
+        [0.57318795, 0.50276023, 0.62565476, 0.6204073 , 0.65596694],
+        [0.59989387, 0.59642035, 0.72549963, 0.70181483, 0.603905  ]],
+
+       [[0.70841444, 0.6787628 , 0.6373904 , 0.6612957 , 0.66548526],
+        [0.5030165 , 0.50026876, 0.6943428 , 0.7046919 , 0.65267944],
+        [0.5492975 , 0.70328647, 0.70220387, 0.64173555, 0.71524936],
+        [0.63749886, 0.61553544, 0.5967878 , 0.7181717 , 0.6559915 ]],
+
+       [[0.5325055 , 0.57738787, 0.58325106, 0.7022748 , 0.541571  ],
+        [0.662003  , 0.5770094 , 0.7068774 , 0.56496465, 0.62275225],
+        [0.69648486, 0.6892656 , 0.71441233, 0.69432604, 0.5038529 ],
+        [0.703531  , 0.7261627 , 0.59789884, 0.67254007, 0.5533266 ]]],
       dtype=float32)]
 
@@ -149,7 +149,7 @@

ONNX Runtime

Navigation

@@ -190,7 +190,7 @@

Quick search

©2018-2021, Microsoft. | - Powered by Sphinx 4.3.2 + Powered by Sphinx 4.4.0 & Alabaster 0.7.12 | diff --git a/docs/api/python/auto_examples/plot_metadata.html b/docs/api/python/auto_examples/plot_metadata.html index c17fbe6318ce8..c1d471705f928 100644 --- a/docs/api/python/auto_examples/plot_metadata.html +++ b/docs/api/python/auto_examples/plot_metadata.html @@ -133,7 +133,7 @@

ONNX Runtime

Navigation

@@ -174,7 +174,7 @@

Quick search

©2018-2021, Microsoft. | - Powered by Sphinx 4.3.2 + Powered by Sphinx 4.4.0 & Alabaster 0.7.12 | diff --git a/docs/api/python/auto_examples/plot_pipeline.html b/docs/api/python/auto_examples/plot_pipeline.html index c8e8705dc666b..28ac5a6a50a41 100644 --- a/docs/api/python/auto_examples/plot_pipeline.html +++ b/docs/api/python/auto_examples/plot_pipeline.html @@ -168,10 +168,10 @@

Draw a model with ONNX

Out:

-
<matplotlib.image.AxesImage object at 0x7feafe274ac0>
+
<matplotlib.image.AxesImage object at 0x7fa9ca3043d0>
 
-

Total running time of the script: ( 0 minutes 0.196 seconds)

+

Total running time of the script: ( 0 minutes 0.296 seconds)

Out:

-
onnxruntime_profile__2022-01-04_17-09-55.json
+
onnxruntime_profile__2022-02-16_20-16-28.json
 

The results are stored un a file in JSON format. @@ -111,23 +111,23 @@

Out:

[{'args': {},
   'cat': 'Session',
-  'dur': 56,
+  'dur': 95,
   'name': 'model_loading_array',
   'ph': 'X',
-  'pid': 3089,
-  'tid': 3089,
-  'ts': 1},
+  'pid': 2881,
+  'tid': 2881,
+  'ts': 2},
  {'args': {},
   'cat': 'Session',
-  'dur': 240,
+  'dur': 275,
   'name': 'session_initialization',
   'ph': 'X',
-  'pid': 3089,
-  'tid': 3089,
-  'ts': 71}]
+  'pid': 2881,
+  'tid': 2881,
+  'ts': 116}]
 
-

Total running time of the script: ( 0 minutes 0.007 seconds)

+

Total running time of the script: ( 0 minutes 0.006 seconds)

@@ -131,9 +131,9 @@

Conversion to ONNX formatOut:

-

Out:

-

Out:

-
[{0: 2.9595235901069827e-05, 1: 0.05698008090257645, 2: 0.9429903030395508},
- {0: 1.8492364688427188e-05, 1: 0.03936365991830826, 2: 0.9606178402900696},
- {0: 0.013000461272895336, 1: 0.8036782741546631, 2: 0.1833212822675705}]
+
[{0: 1.4928715472706244e-06, 1: 0.003353527979925275, 2: 0.9966449737548828},
+ {0: 0.0007424309733323753, 1: 0.40631869435310364, 2: 0.5929388403892517},
+ {0: 0.9263647794723511, 1: 0.07363501191139221, 2: 2.1443614173222159e-07}]
 

Let’s benchmark.

@@ -189,11 +189,11 @@

Probabilities

Out:

Execution time for clr.predict
-Average 4.38e-05 min=4.25e-05 max=6.07e-05
+Average 5.57e-05 min=4.95e-05 max=7.69e-05
 Execution time for ONNX Runtime
-Average 1.97e-05 min=1.92e-05 max=2.46e-05
+Average 3.01e-05 min=2.77e-05 max=4.96e-05
 
-1.9671264999914226e-05
+3.0110844999313713e-05
 

Let’s benchmark a scenario similar to what a webservice @@ -219,11 +219,11 @@

Probabilities

Out:

Execution time for clr.predict
-Average 0.00404 min=0.00402 max=0.00409
+Average 0.00492 min=0.00477 max=0.00556
 Execution time for sess_predict
-Average 0.000881 min=0.000874 max=0.000912
+Average 0.00114 min=0.00111 max=0.0012
 
-0.0008813192099997735
+0.0011378636049998647
 

Let’s do the same for the probabilities.

@@ -239,11 +239,11 @@

Probabilities

Out:

Execution time for predict_proba
-Average 0.00599 min=0.00597 max=0.00606
+Average 0.00715 min=0.00699 max=0.00773
 Execution time for sess_predict_proba
-Average 0.000883 min=0.000876 max=0.000916
+Average 0.00118 min=0.00114 max=0.00134
 
-0.0008830492349999729
+0.0011849476450002782
 

This second comparison is better as @@ -279,11 +279,11 @@

Benchmark with RandomForestOut:

Execution time for predict_proba
-Average 0.717 min=0.715 max=0.72
+Average 0.848 min=0.81 max=0.89
 Execution time for sess_predict_proba
-Average 0.00108 min=0.00107 max=0.00111
+Average 0.00124 min=0.00119 max=0.0013
 
-0.0010817126199989956
+0.0012366176899996618
 

Let’s see with different number of trees.

@@ -317,40 +317,40 @@

Benchmark with RandomForest

Out:

5
-Average 0.0637 min=0.0636 max=0.0639
-Average 0.000869 min=0.000857 max=0.000899
+Average 0.0707 min=0.0699 max=0.0714
+Average 0.00104 min=0.000994 max=0.00107
 10
-Average 0.0982 min=0.098 max=0.0987
-Average 0.000884 min=0.000873 max=0.00091
+Average 0.109 min=0.109 max=0.11
+Average 0.00106 min=0.00102 max=0.00108
 15
-Average 0.133 min=0.133 max=0.133
-Average 0.000895 min=0.000885 max=0.000921
+Average 0.148 min=0.147 max=0.149
+Average 0.00106 min=0.00102 max=0.00112
 20
-Average 0.168 min=0.167 max=0.168
-Average 0.000915 min=0.000907 max=0.00094
+Average 0.188 min=0.186 max=0.194
+Average 0.00114 min=0.00113 max=0.00116
 25
-Average 0.202 min=0.201 max=0.203
-Average 0.000921 min=0.000913 max=0.000948
+Average 0.236 min=0.236 max=0.236
+Average 0.00114 min=0.00113 max=0.00117
 30
-Average 0.236 min=0.236 max=0.237
-Average 0.000922 min=0.000914 max=0.000948
+Average 0.277 min=0.276 max=0.277
+Average 0.00117 min=0.00116 max=0.00119
 35
-Average 0.271 min=0.271 max=0.271
-Average 0.000941 min=0.000928 max=0.000967
+Average 0.317 min=0.316 max=0.317
+Average 0.00118 min=0.00117 max=0.0012
 40
-Average 0.305 min=0.305 max=0.305
-Average 0.000948 min=0.000934 max=0.000977
+Average 0.356 min=0.356 max=0.356
+Average 0.00119 min=0.00118 max=0.00121
 45
-Average 0.34 min=0.339 max=0.34
-Average 0.000983 min=0.000972 max=0.00101
+Average 0.397 min=0.397 max=0.398
+Average 0.00121 min=0.0012 max=0.00124
 50
-Average 0.375 min=0.374 max=0.375
-Average 0.000972 min=0.000966 max=0.000992
+Average 0.437 min=0.437 max=0.438
+Average 0.00122 min=0.00121 max=0.00124
 
-<matplotlib.legend.Legend object at 0x7feaf2025c10>
+<matplotlib.legend.Legend object at 0x7fa9bddf3100>
 
-

Total running time of the script: ( 3 minutes 22.159 seconds)

+

Total running time of the script: ( 3 minutes 57.713 seconds)