-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corner-case performance degradation #121
Comments
Hi @ppolewicz, as far as I understand correctly this is somewhat expected behavior due to biased reference counting - namely non-owning threads can access shared objects' refcount fields using atomics only. In my tests, I was also trying to violate queue performance, which is relatively easy. import time
import threading
from queue import SimpleQueue, Queue
def producer(queue):
buf = bytearray(1500)
for i in range(10**6):
queue.put((i, buf))
queue.put(None) # Add a sentinel value to indicate the end of data
def consumer(queue):
while True:
item = queue.get(timeout=10**0)
if item is None:
break
# Process the item here (e.g., print or do some computation)
# Create a SimpleQueue object
# queue = Queue()
queue = SimpleQueue()
# Create the producer and consumer threads
producer_thread = threading.Thread(target=producer, args=(queue,))
consumer_thread = threading.Thread(target=consumer, args=(queue,))
# Start the timer
start_time = time.time()
# Start the threads
consumer_thread.start()
producer_thread.start()
# Wait for the threads to finish
producer_thread.join()
consumer_thread.join()
# Calculate the elapsed time
end_time = time.time()
elapsed_time = end_time - start_time
# Print the performance metrics
print(f"Data exchange performance: {elapsed_time} seconds.") In nogil CPython there are 2 (or more) threads concurrently fighting for access to the resource (queue) leading to enormous overhead while GIL prevents that (on my machine there is 2 orders of magnitude difference in execution speed between nogil and regular CPython). All in all nogil requires a slightly different approach to data management. In the case of queues, for my project the solution with greedy data buffering on the client side provides good results. |
GIL required workarounds and biased reference counting may sometimes require workarounds too. It'll be harder to run into those, I hope. If my understanding is correct, |
In your code try switching line: threads = [
threading.Thread(target=incr, args=(incrementor, BY_HOW_MUCH))
for i in range(THREAD_COUNT)
] to: threads = [
threading.Thread(target=incr, args=(Incrementor(), BY_HOW_MUCH))
for i in range(THREAD_COUNT)
] And you will see the maximum possible performance on nogil. Of course, the result will be wrong but my point here is that in the original code, 3 child threads are simultaneously trying to access the same instance of the About immortalization: yes, immortalization is part of nogil 3.9 (_Py_REF_IMMORTAL_MASK ) but its existence does not kick in the place you mentioned (in my counterexample 3 threads are freely creating and assigning ints to per-thread |
I think @pjurgielewicz's response has covered most of this, but just in case:
The general problem of memory contention is not specific to Python or nogil -- you can see similar behavior when modifying the same variable from many threads in C or C++ -- but it's potentially more of an issue in nogil because there's possible contention on more things:
The provided example is probably not worth optimizing for, but I expect there will be more real world examples where memory contention is an issue and we have a few possible strategies for mitigating it, including immortalization and deferred reference counting, but how they're applied will depend on the actual context. |
This might be a known limitation and I'm pretty sure it is something one shouldn't be doing when using nogil mode. There is an easy workaround for now to just
PYTHONGIL=1
in such a case, but I thought it might be interesting to show the result of the test I've ran:1 million increments per thread, 3 threads:
5 million increments per thread, 3 threads:
1 million increments per thread, 10 threads:
environment: Ubuntu VM running on 5 cores i5-9600KF CPU @ 3.70GHz
It looks like
nogil/python
withPYTHONGIL=1
provides a more consistent+=
than the official CPython release. Not sure if it's intended. There was no nogil 3.12 docker image, so I didn't test there.Code
incrementor.py
Dockerfile
some raw test results
```sh $ time docker run -it --rm --name incrementor-running incrementor Sending build context to Docker daemon 4.608kB Step 1/4 : FROM python:3.9-slim ---> 313d2f483acd Step 2/4 : WORKDIR /usr/src/app ---> Using cache ---> 21fe8413ac4b Step 3/4 : COPY incrementor.py . ---> Using cache ---> e12f9c4f202c Step 4/4 : CMD ["python3", "incrementor.py"] ---> Using cache ---> df42d07b5298 Successfully built df42d07b5298 Successfully tagged incrementor:latest nogil False 1608925real 0m1,014s
user 0m0,016s
sys 0m0,016s
The text was updated successfully, but these errors were encountered: