Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference performances decrease with number of cpus used. #5193

Closed
maxime-louis opened this issue Sep 16, 2020 · 9 comments
Closed

Inference performances decrease with number of cpus used. #5193

maxime-louis opened this issue Sep 16, 2020 · 9 comments
Labels
stale issues that have not been addressed in a while; categorized by a bot

Comments

@maxime-louis
Copy link

Hey,

I am running inferences on a large 3d Unet. I export the net from pytorch to onnx using

    torch.onnx.export(model, x, "model.onnx", export_params=True, opset_version=11, do_constant_folding=True, input_names=['input'], output_names=['output'], dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}})

and then I'm using it using the syntax

ort_session = onnxruntime.InferenceSession("flair_focal_2d.onnx")
def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
ort_outs = ort_session.run(None, ort_inputs)

I installed onnx and onnxruntime using pip, and I have open mp on my ubuntu 18 machine. This machine has 12 cores/24 threads, but the inference time per image increases with the number of threads (I measure that using taskset on variable number of cpus).

I was not expecting this behaviour but I'm not fully understanding how this library works. Is this expected ?

In production, I have 200 predictions to make every time, and I am tempted to run them in a multi-threaded way (24 parallel calls to 'run' on separate threads). Do you recommend this approach other using a single inference session on all cpus (and potentially tuning a batch size) ? How can I make this procedure thread-safe and disable openmp multi-threading ?

Thanks in advance !

@snnn
Copy link
Member

snnn commented Sep 16, 2020

Depends on how much it increased. For thread pool, the more threads you have, the higher scheduling cost is.

@maxime-louis
Copy link
Author

I am doing batched inferences, batch_size=64.

Using plain torch on 12 cpus takes 6s, and this time increases when using less cpus (which is expected)

With onnxruntime, computation takes 5s on a single cpu, but 12s on 12 cpus. If it's overhead I find it strange that torch does not suffer from it.

@snnn
Copy link
Member

snnn commented Sep 16, 2020

Oh I see, then probably we paralleled some code that shouldn't be paralleled

@maxime-louis
Copy link
Author

Do you have an update on this ?

@snnn
Copy link
Member

snnn commented Sep 22, 2020

How many threads do you have in total? It sounds you created many more threads than the CPU cores, you can make 24 parallel calls to 'run' on separate threads, but you need to disable the multiple threading code in onnxruntime first.

@maxime-louis
Copy link
Author

maxime-louis commented Sep 22, 2020

I have 24 threads on my machine. I am not using any multiprocessing outside of onnx. I am doing a batched inference on a single model using onnxruntime. The odd behaviour is that using taskset to restrict the number of threads used by the python process decreases the inference time ! The less threads, the faster the inference. That is without touching at any point the onnxruntime configuration.

I did a second experiment where I do use a multithreading library to run inferences in parallel: each separate process uses onnxruntime on a small batch of images. Doing this yield the best performances for the inference on my 64 images. (In this configuration, I set omp_num_threads to (total_number_of_threads / number_of_parallel_threads) to prevent an overuse of the cpus.)

It seems to me that batching the inference within a single inference using onnxruntime should be faster than this manual parallelization. That's what I'm used to with pytorch.

@snnn
Copy link
Member

snnn commented Sep 22, 2020

Could you please share me information to allow me reproduce your experiment? For example, the model, a full python script. So that I can debug what happened inside.

@stale
Copy link

stale bot commented Dec 19, 2020

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@stale stale bot added the stale issues that have not been addressed in a while; categorized by a bot label Dec 19, 2020
@faxu faxu closed this as completed Feb 10, 2021
@SaleemAkhtarAngstrom
Copy link

Hi, I'm also noticing that sess.run() is using multiple threads in the python API. Is there any way to disable this? I'd similarly like to call sess.run() in parallel (i.e. call it in different threads as I'm trying to run parallel simulations for monte-carlo analysis)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

4 participants