C# inference is slower than Python #892

dashesy · 2019-04-24T01:20:13Z

Describe the bug
A clear and concise description of what the bug is.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows for C#, Ubuntu16 for Python
ONNX Runtime installed from (source or binary): pip/nuget
ONNX Runtime version: 0.3.0
Python version: 3.5
Visual Studio version (if applicable): 2015

To Reproduce
Load an ONNX file in onnxruntime in Python and in C#, the inference in C# is twice slower

import time
import numpy as np
import onnxruntime as rt

im = np.random.rand(1, 3, 256, 384).astype('uint8')

sess = rt.InferenceSession("model.onnx")

t0 = time.time()
output =  sess.run(['prob'], {'data': im})[0]
print (time.time() - t0)

I get around 0.09s (90ms) but when I load the model in ORT nuget I get about 200ms

using (var results = _session.Run(inputTensor))
{
    return new TensorArray<float>(results.First().AsEnumerable<float>().ToArray(), results.First().AsTensor<float>().Dimensions.ToArray());
}

TensorArray is a wrapper class that accepts float array and its dimension.

Is this behavior expected? is it ToArray that has overhead in C# but not in Python?

Expected behavior
Minimal overhead

Additional context
Sent an email with a model file

The text was updated successfully, but these errors were encountered:

jignparm · 2019-04-24T01:50:33Z

Can you try accessing the span object as shown in option #6 of issue #706?

This should be ~2x faster, based on some trial benchmarking runs. This should give parity with the Python running time.

var results_span = results.First().AsTensor<float>() as DenseTensor<float>).Buffer.Span;

dashesy · 2019-04-24T20:46:58Z

I cannot index the Span in VS2015, so I cannot use it instead of float[] array:

By-reference return type 'ref T' is not supported

I tried Span.ToArray() but that did not change anything.

Even when I ignore the actual outputs in C# (fill in with null instead) there is a 40 to 50ms difference between C# and Python (even though in Python I fetch the outputs).

faxu · 2019-05-28T18:04:30Z

@dashesy Is this still a persistent issue with the latest runtime version? If not, please close the issue. If so, let's see how we can help to troubleshoot this.

dashesy · 2019-05-28T18:18:50Z

This is same issue with 0.4.0 is there a newer version I can try? I shared a small project with you that shows the issue

jignparm · 2019-05-28T18:43:07Z

The 40-50ms performance gap between Python and C# requires investigation. It looks like there's a delta even without retrieving/iterating over the output results. This should be addressed as part of the quality milestones.

dashesy · 2019-05-28T18:53:32Z

One thing I noticed is that this gap is almost constant. For an inference that is in the order of 150ms, a 40-60ms gap is more noticeable, but it is always there.

jignparm · 2019-07-02T02:53:23Z

@dashesy, reviving this old issue, as part of quality cleanup. The C# running versus Python time is close, at least in current master branch..

On the celeb model, for 100 iterations, C#=42.48 milliseconds on average, and Python=45.13 milliseconds.

If anything, C# is a bit faster for the same model. It's possible that recent changes to C-API (which C# uses directly) may have closed the gap. I'm not aware of any C# API changes that would have had any impact.

Please ensure you are running in release mode. Debug builds are going to be slower.

Closing this issue for now.

Python Results

import time
import numpy as np
import onnxruntime as rt
im = np.random.rand(1, 3, 257, 257).astype('uint8')
sess = rt.InferenceSession("D:\models\ehsan\celeb\celeb.onnx")
totaltime=0
numiters = 100
t0 = time.time()
for i in range(0, numiters):
  output =  sess.run(['prob'], {'data': im})[0]
  t1 = time.time()
  totaltime += t1 - t0
  t0 = t1

print ("Average time to score 1 tensor in milliseconds = " , totaltime/numiters * 1000)

D:\testpythonspeed>python test.py
2019-07-02 02:45:24.2376383 [W:onnxruntime:Default, graph.cc:2208 onnxruntime::Graph::CleanUnusedInitializers] 1 exists in this graph's initializers but it is not used by any node
2019-07-02 02:45:24.2399926 [W:onnxruntime:Default, graph.cc:2208 onnxruntime::Graph::CleanUnusedInitializers] 2 exists in this graph's initializers but it is not used by any node
2019-07-02 02:45:24.6118091 [W:onnxruntime:InferenceSession, session_state_initializer.cc:502 onnxruntime::SaveInputOutputNamesToNodeMapping] Graph input with name 1 is not associated with a node.
2019-07-02 02:45:24.6141743 [W:onnxruntime:InferenceSession, session_state_initializer.cc:502 onnxruntime::SaveInputOutputNamesToNodeMapping] Graph input with name 2 is not associated with a node.
Average time to score 1 tensor in milliseconds =  45.129685401916504

C#Results

D:\onnxruntime\csharp\tools\Microsoft.ML.OnnxRuntime.PerfTool\bin\Release\netcoreapp2.0>dotnet Microsoft.ML.OnnxRuntime.PerfTool.dll -m d:\models\ehsan\celeb\celeb.onnx -i bench.in  -c 100
Running model D:\models\ehsan\celeb\celeb.onnx in OnnxRuntime:
input:bench.in
iteration count:100
parallel execution:False
optimization level:1
2019-07-02 02:37:37.8805107 [W:onnxruntime:CSharpOnnxRuntime, graph.cc:2227 onnxruntime::Graph::CleanUnusedInitializers] 1 exists in this graph's initializers but it is not used by any node
2019-07-02 02:37:37.8821444 [W:onnxruntime:CSharpOnnxRuntime, graph.cc:2227 onnxruntime::Graph::CleanUnusedInitializers] 2 exists in this graph's initializers but it is not used by any node
2019-07-02 02:37:38.2538733 [W:onnxruntime:InferenceSession, session_state_initializer.cc:417 onnxruntime::SaveInputOutputNamesToNodeMapping] Graph input with name 1 is not associated with a node.
2019-07-02 02:37:38.2554571 [W:onnxruntime:InferenceSession, session_state_initializer.cc:417 onnxruntime::SaveInputOutputNamesToNodeMapping] Graph input with name 2 is not associated with a node.
Model Load Time = 930.2308
Input Load Time = 6.961
Total Run time for 100 iterations = 4248.2424
Per iteration time = 42.482424
Done

kkulkarniMS · 2022-09-13T16:40:36Z

Hi,
I have a similar issue. The details are mentioned here - #12937

@jignparm - Please share your insights on my issue

jignparm self-assigned this Apr 24, 2019

faxu added the performance label Apr 24, 2019

jignparm closed this as completed Jul 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C# inference is slower than Python #892

C# inference is slower than Python #892

dashesy commented Apr 24, 2019 •

edited

Loading

jignparm commented Apr 24, 2019 •

edited

Loading

dashesy commented Apr 24, 2019

faxu commented May 28, 2019

dashesy commented May 28, 2019

jignparm commented May 28, 2019

dashesy commented May 28, 2019

jignparm commented Jul 2, 2019

kkulkarniMS commented Sep 13, 2022

C# inference is slower than Python #892

C# inference is slower than Python #892

Comments

dashesy commented Apr 24, 2019 • edited Loading

jignparm commented Apr 24, 2019 • edited Loading

dashesy commented Apr 24, 2019

faxu commented May 28, 2019

dashesy commented May 28, 2019

jignparm commented May 28, 2019

dashesy commented May 28, 2019

jignparm commented Jul 2, 2019

Python Results

C#Results

kkulkarniMS commented Sep 13, 2022

dashesy commented Apr 24, 2019 •

edited

Loading

jignparm commented Apr 24, 2019 •

edited

Loading