Inference performances decrease with number of cpus used. #5193

maxime-louis · 2020-09-16T12:10:30Z

Hey,

I am running inferences on a large 3d Unet. I export the net from pytorch to onnx using

    torch.onnx.export(model, x, "model.onnx", export_params=True, opset_version=11, do_constant_folding=True, input_names=['input'], output_names=['output'], dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}})

and then I'm using it using the syntax

ort_session = onnxruntime.InferenceSession("flair_focal_2d.onnx")
def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
ort_outs = ort_session.run(None, ort_inputs)

I installed onnx and onnxruntime using pip, and I have open mp on my ubuntu 18 machine. This machine has 12 cores/24 threads, but the inference time per image increases with the number of threads (I measure that using taskset on variable number of cpus).

I was not expecting this behaviour but I'm not fully understanding how this library works. Is this expected ?

In production, I have 200 predictions to make every time, and I am tempted to run them in a multi-threaded way (24 parallel calls to 'run' on separate threads). Do you recommend this approach other using a single inference session on all cpus (and potentially tuning a batch size) ? How can I make this procedure thread-safe and disable openmp multi-threading ?

Thanks in advance !

The text was updated successfully, but these errors were encountered:

snnn · 2020-09-16T16:05:11Z

Depends on how much it increased. For thread pool, the more threads you have, the higher scheduling cost is.

maxime-louis · 2020-09-16T16:15:17Z

I am doing batched inferences, batch_size=64.

Using plain torch on 12 cpus takes 6s, and this time increases when using less cpus (which is expected)

With onnxruntime, computation takes 5s on a single cpu, but 12s on 12 cpus. If it's overhead I find it strange that torch does not suffer from it.

snnn · 2020-09-16T16:27:02Z

Oh I see, then probably we paralleled some code that shouldn't be paralleled

maxime-louis · 2020-09-22T07:40:33Z

Do you have an update on this ?

snnn · 2020-09-22T07:57:29Z

How many threads do you have in total? It sounds you created many more threads than the CPU cores, you can make 24 parallel calls to 'run' on separate threads, but you need to disable the multiple threading code in onnxruntime first.

maxime-louis · 2020-09-22T13:19:31Z

I have 24 threads on my machine. I am not using any multiprocessing outside of onnx. I am doing a batched inference on a single model using onnxruntime. The odd behaviour is that using taskset to restrict the number of threads used by the python process decreases the inference time ! The less threads, the faster the inference. That is without touching at any point the onnxruntime configuration.

I did a second experiment where I do use a multithreading library to run inferences in parallel: each separate process uses onnxruntime on a small batch of images. Doing this yield the best performances for the inference on my 64 images. (In this configuration, I set omp_num_threads to (total_number_of_threads / number_of_parallel_threads) to prevent an overuse of the cpus.)

It seems to me that batching the inference within a single inference using onnxruntime should be faster than this manual parallelization. That's what I'm used to with pytorch.

snnn · 2020-09-22T16:51:31Z

Could you please share me information to allow me reproduce your experiment? For example, the model, a full python script. So that I can debug what happened inside.

stale · 2020-12-19T08:13:07Z

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

SaleemAkhtarAngstrom · 2022-09-28T08:41:04Z

Hi, I'm also noticing that sess.run() is using multiple threads in the python API. Is there any way to disable this? I'd similarly like to call sess.run() in parallel (i.e. call it in different threads as I'm trying to run parallel simulations for monte-carlo analysis)

snnn added the type:performance label Sep 16, 2020

stale bot added the stale issues that have not been addressed in a while; categorized by a bot label Dec 19, 2020

faxu closed this as completed Feb 10, 2021

matrs mentioned this issue Nov 15, 2023

[question] is there a way to set the number of cores/threads used? minimaxir/imgbeddings#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference performances decrease with number of cpus used. #5193

Inference performances decrease with number of cpus used. #5193

maxime-louis commented Sep 16, 2020

snnn commented Sep 16, 2020

maxime-louis commented Sep 16, 2020

snnn commented Sep 16, 2020

maxime-louis commented Sep 22, 2020

snnn commented Sep 22, 2020

maxime-louis commented Sep 22, 2020 •

edited

Loading

snnn commented Sep 22, 2020

stale bot commented Dec 19, 2020

SaleemAkhtarAngstrom commented Sep 28, 2022

Inference performances decrease with number of cpus used. #5193

Inference performances decrease with number of cpus used. #5193

Comments

maxime-louis commented Sep 16, 2020

snnn commented Sep 16, 2020

maxime-louis commented Sep 16, 2020

snnn commented Sep 16, 2020

maxime-louis commented Sep 22, 2020

snnn commented Sep 22, 2020

maxime-louis commented Sep 22, 2020 • edited Loading

snnn commented Sep 22, 2020

stale bot commented Dec 19, 2020

SaleemAkhtarAngstrom commented Sep 28, 2022

maxime-louis commented Sep 22, 2020 •

edited

Loading