Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence iterator thread-safety #120496

Closed
rostan-t opened this issue Jun 14, 2024 · 10 comments
Closed

Sequence iterator thread-safety #120496

rostan-t opened this issue Jun 14, 2024 · 10 comments
Labels
topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@rostan-t
Copy link

rostan-t commented Jun 14, 2024

Bug report

Bug description:

Sequence iterators are not thread-safe under the free-threaded build. They'll sometimes be accessed with the same index.

Here is a minimal repro:

import concurrent.futures

N = 10000

for _ in range(100):
    it = iter(range(N))

    with concurrent.futures.ThreadPoolExecutor() as executor:
        data = set(executor.map(lambda _: next(it), range(N)))

    assert len(data) == N, f"Expected {N} distinct elements, got {len(data)}"

I also tested it with list and dict iterators as well as a custom class implementing __getitem__.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux, macOS

Linked PRs

@rostan-t rostan-t added the type-bug An unexpected behavior, bug, or error label Jun 14, 2024
@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Jun 14, 2024

I suspect we might want something faster than just locking the iterator in this case. Locking might slow down iteration quite a bit. Will wait for Sam to hear his ideas.

@Fidget-Spinner
Copy link
Member

Btw, the race is rangeiter_next (and all the other iterator protocols) racing with itself. Here's a sample:

WARNING: ThreadSanitizer: data race (pid=25010)
  Read of size 8 at 0x7fffb501b320 by thread T2:
    #0 rangeiter_next /home/ken/Documents/GitHub/cpython/Objects/rangeobject.c:819:12 (python+0x35bd45) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)
    #1 builtin_next /home/ken/Documents/GitHub/cpython/Python/bltinmodule.c:1547:11 (python+0x494bb8) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)
    #2 cfunction_vectorcall_FASTCALL /home/ken/Documents/GitHub/cpython/Objects/methodobject.c:425:24 (python+0x318a6c) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)
    #3 _PyObject_VectorcallTstate /home/ken/Documents/GitHub/cpython/./Include/internal/pycore_call.h:168:11 (python+0x2589c1) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)
    #4 PyObject_Vectorcall /home/ken/Documents/GitHub/cpython/Objects/call.c:327:12 (python+0x25a590) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)
    #5 _PyEval_EvalFrameDefault /home/ken/Documents/GitHub/cpython/Python/generated_cases.c.h:813:23 (python+0x4a1252) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)
    #6 _PyEval_EvalFrame /home/ken/Documents/GitHub/cpython/./Include/internal/pycore_ceval.h:119:16 (python+0x49b343) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)
    #7 _PyEval_Vector /home/ken/Documents/GitHub/cpython/Python/ceval.c:1819:12 (python+0x49b343)
    #8 _PyFunction_Vectorcall /home/ken/Documents/GitHub/cpython/Objects/call.c (python+0x25aabe) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)
    #9 _PyObject_VectorcallTstate /home/ken/Documents/GitHub/cpython/./Include/internal/pycore_call.h:168:11 (python+0x260c31) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)
    #10 method_vectorcall /home/ken/Documents/GitHub/cpython/Objects/classobject.c:70:20 (python+0x25ef55) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)
    #11 _PyVectorcall_Call /home/ken/Documents/GitHub/cpython/Objects/call.c:273:16 (python+0x25a45f) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)
    #12 _PyObject_Call /home/ken/Documents/GitHub/cpython/Objects/call.c:348:16 (python+0x25a6f7) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)
    #13 PyObject_Call /home/ken/Documents/GitHub/cpython/Objects/call.c:373:12 (python+0x25a8e7) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)
    #14 thread_run /home/ken/Documents/GitHub/cpython/./Modules/_threadmodule.c:337:21 (python+0x6a2168) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)
    #15 pythread_wrapper /home/ken/Documents/GitHub/cpython/Python/thread_pthread.h:243:5 (python+0x5dd89b) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)

  Previous write of size 8 at 0x7fffb501b320 by thread T3:
    #0 rangeiter_next /home/ken/Documents/GitHub/cpython/Objects/rangeobject.c:822:15 (python+0x35bd82) (BuildId: 59329152ec1bb081b2f85313eff1036edc524111)

@rostan-t
Copy link
Author

I suspect we might want something faster than just locking the iterator in this case.

In the section Container Thread-Safety, PEP 703 mentions fast paths for listiter_next and dictiter_iternextkey/value/item.

@Fidget-Spinner
Copy link
Member

Thanks for the link! I don't think we can do the same thing for the rangeiter_next as it might easily cause it to exceed its range. I have another idea using seqlocks.

@colesbury
Copy link
Contributor

I don't think we want to fix this -- the performance trade-off is not worth it. We should instead document that iterators aren't thread-safe -- that they won't be guaranteed to return unique elements if accessed concurrently from multiple threads.

@corona10
Copy link
Member

I don't think we want to fix this -- the performance trade-off is not worth it. We should instead document that iterators aren't thread-safe -- that they won't be guaranteed to return unique elements if accessed concurrently from multiple threads.

+1

@corona10
Copy link
Member

I think that we can close the issue after some documentation.

@eendebakpt , AFAIK, you are working on several other cases, but if your works are related to thread-safe of iterators, I think that Sam's opinion is more reasonable for me.

@eendebakpt
Copy link
Contributor

I don't think we want to fix this -- the performance trade-off is not worth it. We should instead document that iterators aren't thread-safe -- that they won't be guaranteed to return unique elements if accessed concurrently from multiple threads.

I am worried that not making it thread safe will lead to some very hard to track down bugs. Besides the risk of returning unexpected elements (maybe it is enough to document this, although I am personally not convinced yet), there is also the risks of overflows inside the C code (that will happen for the reversed iteration).

Regardless of the final decision, I will benchmark the enumerate and reversed iterators with and without the critical sections to see what performance trade-off we are talking about.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 18, 2024
(cherry picked from commit 7e189ae)

Co-authored-by: Donghee Na <donghee.na@python.org>
corona10 added a commit that referenced this issue Jun 19, 2024
…120706)

gh-120496: Add a note about iterator thread-safe (gh-120685)
(cherry picked from commit 7e189ae)

Co-authored-by: Donghee Na <donghee.na@python.org>
@eendebakpt
Copy link
Contributor

@colesbury If I understand correctly we want the container iterators to be thread-safe in the sense that iterating using multiple threads will not crash the interpreter, but it is ok to be not thread-safe in the sense that the iterator can return results that would under single-threading be considered a bug. For example when using multiple threads the iteration over enumerate(range(10)) could return (5, 6).

That was very surprising for me as a user! I did read PEP 703 again, and I believe the behavior is correctly described in the section on container thread-safety. Also the documentation update by @corona10 will help. Nevertheless, I am wondering if we can somehow make users more aware of this. For example by providing explicit examples in documentation like https://github.com/python/cpython/blob/main/Doc/howto/free-threading-extensions.rst?

@colesbury
Copy link
Contributor

If I understand correctly we want the container iterators to be thread-safe in the sense that iterating using multiple threads will not crash the interpreter, but it is ok to be not thread-safe in the sense that the iterator can return results that would under single-threading be considered a bug

Yes

For example by providing explicit examples in documentation like...

Yes, I agree. I've started working on a free-threading howto for Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

6 participants