Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C# inference is slower than Python #892

Closed
dashesy opened this issue Apr 24, 2019 · 8 comments
Closed

C# inference is slower than Python #892

dashesy opened this issue Apr 24, 2019 · 8 comments
Assignees

Comments

@dashesy
Copy link

dashesy commented Apr 24, 2019

Describe the bug
A clear and concise description of what the bug is.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows for C#, Ubuntu16 for Python
  • ONNX Runtime installed from (source or binary): pip/nuget
  • ONNX Runtime version: 0.3.0
  • Python version: 3.5
  • Visual Studio version (if applicable): 2015

To Reproduce
Load an ONNX file in onnxruntime in Python and in C#, the inference in C# is twice slower

import time
import numpy as np
import onnxruntime as rt

im = np.random.rand(1, 3, 256, 384).astype('uint8')

sess = rt.InferenceSession("model.onnx")

t0 = time.time()
output =  sess.run(['prob'], {'data': im})[0]
print (time.time() - t0)

I get around 0.09s (90ms) but when I load the model in ORT nuget I get about 200ms

using (var results = _session.Run(inputTensor))
{
    return new TensorArray<float>(results.First().AsEnumerable<float>().ToArray(), results.First().AsTensor<float>().Dimensions.ToArray());
}

TensorArray is a wrapper class that accepts float array and its dimension.

Is this behavior expected? is it ToArray that has overhead in C# but not in Python?

Expected behavior
Minimal overhead

Additional context
Sent an email with a model file

@jignparm
Copy link
Contributor

jignparm commented Apr 24, 2019

Can you try accessing the span object as shown in option #6 of issue #706?

This should be ~2x faster, based on some trial benchmarking runs. This should give parity with the Python running time.

var results_span = results.First().AsTensor<float>() as DenseTensor<float>).Buffer.Span; 

@jignparm jignparm self-assigned this Apr 24, 2019
@dashesy
Copy link
Author

dashesy commented Apr 24, 2019

I cannot index the Span in VS2015, so I cannot use it instead of float[] array:

By-reference return type 'ref T' is not supported

I tried Span.ToArray() but that did not change anything.

Even when I ignore the actual outputs in C# (fill in with null instead) there is a 40 to 50ms difference between C# and Python (even though in Python I fetch the outputs).

@faxu
Copy link
Contributor

faxu commented May 28, 2019

@dashesy Is this still a persistent issue with the latest runtime version? If not, please close the issue. If so, let's see how we can help to troubleshoot this.

@dashesy
Copy link
Author

dashesy commented May 28, 2019

This is same issue with 0.4.0 is there a newer version I can try? I shared a small project with you that shows the issue

@jignparm
Copy link
Contributor

The 40-50ms performance gap between Python and C# requires investigation. It looks like there's a delta even without retrieving/iterating over the output results. This should be addressed as part of the quality milestones.

@dashesy
Copy link
Author

dashesy commented May 28, 2019

One thing I noticed is that this gap is almost constant. For an inference that is in the order of 150ms, a 40-60ms gap is more noticeable, but it is always there.

@jignparm
Copy link
Contributor

jignparm commented Jul 2, 2019

@dashesy, reviving this old issue, as part of quality cleanup. The C# running versus Python time is close, at least in current master branch..

On the celeb model, for 100 iterations, C#=42.48 milliseconds on average, and Python=45.13 milliseconds.

If anything, C# is a bit faster for the same model. It's possible that recent changes to C-API (which C# uses directly) may have closed the gap. I'm not aware of any C# API changes that would have had any impact.

Please ensure you are running in release mode. Debug builds are going to be slower.

Closing this issue for now.

Python Results

import time
import numpy as np
import onnxruntime as rt
im = np.random.rand(1, 3, 257, 257).astype('uint8')
sess = rt.InferenceSession("D:\models\ehsan\celeb\celeb.onnx")
totaltime=0
numiters = 100
t0 = time.time()
for i in range(0, numiters):
  output =  sess.run(['prob'], {'data': im})[0]
  t1 = time.time()
  totaltime += t1 - t0
  t0 = t1

print ("Average time to score 1 tensor in milliseconds = " , totaltime/numiters * 1000)
D:\testpythonspeed>python test.py
2019-07-02 02:45:24.2376383 [W:onnxruntime:Default, graph.cc:2208 onnxruntime::Graph::CleanUnusedInitializers] 1 exists in this graph's initializers but it is not used by any node
2019-07-02 02:45:24.2399926 [W:onnxruntime:Default, graph.cc:2208 onnxruntime::Graph::CleanUnusedInitializers] 2 exists in this graph's initializers but it is not used by any node
2019-07-02 02:45:24.6118091 [W:onnxruntime:InferenceSession, session_state_initializer.cc:502 onnxruntime::SaveInputOutputNamesToNodeMapping] Graph input with name 1 is not associated with a node.
2019-07-02 02:45:24.6141743 [W:onnxruntime:InferenceSession, session_state_initializer.cc:502 onnxruntime::SaveInputOutputNamesToNodeMapping] Graph input with name 2 is not associated with a node.
Average time to score 1 tensor in milliseconds =  45.129685401916504

C#Results

D:\onnxruntime\csharp\tools\Microsoft.ML.OnnxRuntime.PerfTool\bin\Release\netcoreapp2.0>dotnet Microsoft.ML.OnnxRuntime.PerfTool.dll -m d:\models\ehsan\celeb\celeb.onnx -i bench.in  -c 100
Running model D:\models\ehsan\celeb\celeb.onnx in OnnxRuntime:
input:bench.in
iteration count:100
parallel execution:False
optimization level:1
2019-07-02 02:37:37.8805107 [W:onnxruntime:CSharpOnnxRuntime, graph.cc:2227 onnxruntime::Graph::CleanUnusedInitializers] 1 exists in this graph's initializers but it is not used by any node
2019-07-02 02:37:37.8821444 [W:onnxruntime:CSharpOnnxRuntime, graph.cc:2227 onnxruntime::Graph::CleanUnusedInitializers] 2 exists in this graph's initializers but it is not used by any node
2019-07-02 02:37:38.2538733 [W:onnxruntime:InferenceSession, session_state_initializer.cc:417 onnxruntime::SaveInputOutputNamesToNodeMapping] Graph input with name 1 is not associated with a node.
2019-07-02 02:37:38.2554571 [W:onnxruntime:InferenceSession, session_state_initializer.cc:417 onnxruntime::SaveInputOutputNamesToNodeMapping] Graph input with name 2 is not associated with a node.
Model Load Time = 930.2308
Input Load Time = 6.961
Total Run time for 100 iterations = 4248.2424
Per iteration time = 42.482424
Done

@jignparm jignparm closed this as completed Jul 2, 2019
@kkulkarniMS
Copy link

Hi,
I have a similar issue. The details are mentioned here - #12937

@jignparm - Please share your insights on my issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants