-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiprocessing overhead 100x higher with multiple cores but not if restricted to 1 core #98493
Comments
This is very easily reproduced.
Note that this is Linux where the import multiprocessing
import timeit
count = 100
def f():
pool = multiprocessing.Pool(1)
with pool:
print(timeit.timeit(lambda: pool.apply(int, [42]), number=count) / count)
if __name__ == "__main__":
multiprocessing.set_start_method("spawn") # "fork" is the default
f() |
Good to hear that I am not the only one seeing this. How many cores does you machine have in total? I couldn't reproduce it on my notebook with 16 cores. |
The goal is to avoid the multiprocessing overhead for this method, which is called often but always returns the same result. The multiprocessing overhead is typically not large, but in some situations it is: python/cpython#98493 On systems that are affect by this, it can save several minutes of execution time if there are 10000 runs.
@ambv as FYI |
I have tested this again, but cannot reproduce it anymore, on the same system as in my initial post. Nothing on the system has changed except that the kernel is 5.15.0-91-generic now and Python is 3.10.12. I have also tested with the official Docker containers for 3.10.6, 3.10.8, 3.11, and 3.12, none of them seem to show a difference with Can other people confirm this? |
I cannot reproduce this either. The system I had reproduced it on was likely just running vanilla debian or ubuntu at that time (late October 2022). Anyways, without an ability to reproduce this... closing. mystery. possibly an issue with a specific common distro kernel at that time? |
Bug report
The following simple Python script basically measures the overhead of using the
multiprocessing
module and typically prints a value less than 0.001 on my machines:On one of our machines (which has 256 CPU cores but the same software as others) it is two orders of magnitude slower and prints a value around 0.05, i.e., one multiprocessing call takes 50ms. (This is repeatable consistently and independent of how many iterations I let it run.)
The weird thing is that if I restrict the Python interpreter to a single core by starting it with
taskset -c 0
(any other core works as well), the script is suddenly 100x faster and thus as fast as on other machines. As soon as I allow two or more cores to be used, the performance is bad again. It does not matter how many cores are allowed to use, as long as it is more than one. The machine is otherwise idle.My guess is that something is slow in multiprocessing is slow if the machine has many cores, but there is some alternative fast path that is used only if only a single core is allowed.
I tried to debug and profile this and found out the following:
pool.apply()
call is roughly the same (with the usual variations due to noise), i.e., it is not the case that some of the calls block for a long time and all others are fast.multiprocessing
implementation for thethreading.Event
that signals that the result has arrived. It does not spend significant CPU time.I used yappi as a profiler because it supports multiple threads (it significantly slows down everything but I hope this does not influence the results). Its thread stats are this:
Thread 3 was the one that ran at 100% CPU in this case, but sometimes it is Thread 2 instead. The function stats reported by yappi for this thread are (sorted by tsub, measured CPU time):
At this point I am lost and do not know what else I could do to find out more. But I would be glad to provide additional information or assist otherwise if requested! I hope the above provides enough of a starting point.
Your environment
The text was updated successfully, but these errors were encountered: