Skip to content

iter(range) shared by threads exceeds max range value #131199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ptmcg opened this issue Mar 13, 2025 · 5 comments
Closed

iter(range) shared by threads exceeds max range value #131199

ptmcg opened this issue Mar 13, 2025 · 5 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@ptmcg
Copy link

ptmcg commented Mar 13, 2025

Bug report

Bug description:

I was experimenting with atomic updates to containers and iterators across threads, and wrote this code. The GIL-enabled version does not have an issue, but the free-threaded version overruns the range iterator.
(Tested using CPython 3.14a05)

import sys

detect_gil = getattr(
    sys, '_is_gil_enabled', lambda: "no GIL detection available"
)
print(f"sys._is_gil_enabled() = {detect_gil()}")

import itertools
from threading import Thread, Barrier
from time import perf_counter


target_total = 800_000
num_workers = 8

# shared data - will access to these be thread-safe?
worker_id = itertools.count(1)
counter = iter(range(target_total))

# free-threaded code is _much_ faster with pre-allocated lists (over list.append or deque.append)
ints = [-1] * target_total
result_buffer = [None] * target_total

barrier = Barrier(num_workers)

def worker():
    this_thread = next(worker_id)

    # wait for all threads to have started before proceeding
    barrier.wait()

    # initialize buffer_index in case all the other thread exhaust counter
    # before this thread gets a chance to run
    buffer_index = 0

    # all threads share a common iterator
    for buffer_index in counter:

        ### THIS IS THE BAD PART - I SHOULDN"T HAVE TO DO THIS CHECK, SINCE counter IS AN 
        ### ITERATOR OVER range(target_total)
        if buffer_index >= target_total:
            # this shouldn't happen if counter is an iterator on a range
            break

        value = buffer_index + 1

        # increment the shared counter and add the result to the shared set WITHOUT locking
        ints[buffer_index] = value
        # ints.append(value)

        result_buffer[buffer_index] = (this_thread, value)
        # result_buffer.append((this_thread, value))

    if buffer_index >= target_total:
        ### THIS SHOULD NEVER HAPPEN, BUT IN THE OUPTUT YOU'LL SEE THAT IT DOES
        ### IN THE FREE-THREADED CASE
        print(f"iterator exceeded range max: {buffer_index=}, {len(ints)=} (shouldn't happen)")

threads = [Thread(target=worker) for _ in range(num_workers)]

for t in threads[:-1]:
    t.start()
input("Press Enter to start the threads")

start = perf_counter()

# starting the n'th thread releases the barrier
threads[-1].start()

for t in threads:
    t.join()
end = perf_counter()

print(">>>", end-start)

# see if ints are sorted - they don't have to be, if they are it just shows that
# the threads were able to safely increment the shared counter and append to the
# shared list
ints_are_sorted = all(
    a == b for a, b in zip(ints, range(1, target_total + 1))
)

# verify that no numbers are missing or duplicated (check expected sum of values 1-target_total)
assert sum(ints) == target_total * (target_total + 1) // 2

# Is the shared list sorted? If not, the counter increment
# and list append operations together are not thread-safe
# (as expected).
# (though passing the assert does imply thread-safety
# of each of the operations individually).
print("sorted?", ints_are_sorted)

# see how evenly the work was distributed across the thread
# workers
from collections import Counter
tally = Counter(i[0] for i in result_buffer)
for thread_id, count in tally.most_common():
    print(f"{thread_id:2d} {count:16,d}")
print(f"{sum(tally.values()):,}")

With GIL enabled:

sys._is_gil_enabled() = True
Press Enter to start the threads
>>> 0.5602053000038723
sorted? True
 1          147,616
 4          147,199
 7          108,055
 5          100,552
 6           99,720
 8           94,804
 2           72,038
 3           30,016
800,000

With free-threaded 3.14:

sys._is_gil_enabled() = False
Press Enter to start the threads
(pyparsing_3.13) PS C:\Users\ptmcg\dev\pyparsing\gh\pyparsing> py -3.14t "C:\Users\ptmcg\dev\pyparsing\gh\pyparsing\tests\nogil_multithreading_bug_report.py"
sys._is_gil_enabled() = False
Press Enter to start the threads
iterator exceeded range max: buffer_index=800003, len(ints)=800000 (shouldn't happen)iterator exceeded range max: buffer_index=800002, len(ints)=800000 (shouldn't happen)
iterator exceeded range max: buffer_index=800004, len(ints)=800000 (shouldn't happen)
iterator exceeded range max: buffer_index=800006, len(ints)=800000 (shouldn't happen)iterator exceeded range max: buffer_index=800007, len(ints)=800000 (shouldn't happen)


iterator exceeded range max: buffer_index=800000, len(ints)=800000 (shouldn't happen)

iterator exceeded range max: buffer_index=800005, len(ints)=800000 (shouldn't happen)iterator exceeded range max: buffer_index=800001, len(ints)=800000 (shouldn't happen)
>>> 3.4842117999942275
sorted? True
 5          138,725
 8          100,496
 3           95,411
 4           94,917
 6           94,825
 2           92,904
 1           92,339
 7           90,383
800,000

CPython versions tested on:

3.14

Operating systems tested on:

Windows

@ptmcg ptmcg added the type-bug An unexpected behavior, bug, or error label Mar 13, 2025
@picnixz picnixz added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading labels Mar 13, 2025
@colesbury
Copy link
Contributor

Is this the same as the following issue?

From https://docs.python.org/3/howto/free-threading-python.html#iterators:

Sharing the same iterator object between multiple threads is generally not safe

I don't think we have plans to change that in 3.14.

@ptmcg
Copy link
Author

ptmcg commented Mar 13, 2025

It doesn't seem to be a thread safety issue per se - the issue I'm surfacing is that the range max value is not respected.

Calling next() on iterators returned from itertools.count() so far seems to be an atomic function that threads don't trip over, even in free-threaded code. The only real thread-safety issue I've found is when I intentionally introduce unsafe design requiring maintaining consistency between two state values

Thanks for the link to the HOW_TO, I'll study that further.

@ptmcg
Copy link
Author

ptmcg commented Mar 13, 2025

I've edited my code snippet to make it a little clearer where I'm having an issue.

@Yhg1s
Copy link
Member

Yhg1s commented Mar 14, 2025

It doesn't seem to be a thread safety issue per se - the issue I'm surfacing is that the range max value is not respected.

Yes, that's a thread safety issue. You're sharing an object between threads, but that object isn't implemented in a way that allows it to be safely shared between threads. The concurrent calls to its __next__() are not thread-safe.

@colesbury
Copy link
Contributor

Thanks @ptmcg for the bug report and sample code. Let's track this in the existing issue #129068.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants