Skip to content

Data races in free-threaded python on Py_buffer use #130977

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tom-pytel opened this issue Mar 8, 2025 · 13 comments
Closed

Data races in free-threaded python on Py_buffer use #130977

tom-pytel opened this issue Mar 8, 2025 · 13 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@tom-pytel
Copy link
Contributor

tom-pytel commented Mar 8, 2025

Bug report

Bug description:

memoryview is used here for demonstration purposes, this should pop up in anything that uses the buffer interface.

Reproducer:

import threading


def copy_back_and_forth(b, a, d, count):
    b.wait()
    for _ in range(count):
        a[0] = d[1]
        a[1] = d[0]


def check(funcs, *args):
    barrier = threading.Barrier(len(funcs))
    thrds = []

    for func in funcs:
        thrd = threading.Thread(target=func, args=(barrier, *args))

        thrds.append(thrd)
        thrd.start()

    for thrd in thrds:
        thrd.join()


if __name__ == '__main__':
    b = bytearray([0, 1])
    while True:
        check([copy_back_and_forth] * 10, memoryview(b), memoryview(b), 100)

Output:

WARNING: ThreadSanitizer: data race (pid=25332)
  Write of size 1 at 0x7f618e090d91 by thread T20:
    #0 pack_single Objects/memoryobject.c:1926 (python+0x24ed75)
    #1 memory_ass_sub Objects/memoryobject.c:2688 (python+0x25015a)
    #2 PyObject_SetItem Objects/abstract.c:232 (python+0x150cd6)
    #3 _PyEval_EvalFrameDefault Python/generated_cases.c.h:11143 (python+0x466235)
    #4 _PyEval_EvalFrame Include/internal/pycore_ceval.h:116 (python+0x46db5c)
    #5 _PyEval_Vector Python/ceval.c:1820 (python+0x46db5c)
    #6 _PyFunction_Vectorcall Objects/call.c:413 (python+0x187eb4)
    #7 _PyObject_VectorcallTstate Include/internal/pycore_call.h:167 (python+0x18fb2c)
    #8 method_vectorcall Objects/classobject.c:72 (python+0x18fb2c)
    #9 _PyVectorcall_Call Objects/call.c:273 (python+0x18b9a7)
    #10 _PyObject_Call Objects/call.c:348 (python+0x18be9f)
    #11 PyObject_Call Objects/call.c:373 (python+0x18bf24)
    #12 thread_run Modules/_threadmodule.c:354 (python+0x668b93)
    #13 pythread_wrapper Python/thread_pthread.h:242 (python+0x57d1ea)

  Previous read of size 1 at 0x7f618e090d91 by thread T11:
    #0 unpack_single Objects/memoryobject.c:1803 (python+0x24b841)
    #1 memory_item Objects/memoryobject.c:2471 (python+0x24c79a)
    #2 memory_subscript Objects/memoryobject.c:2607 (python+0x254d56)
    #3 PyObject_GetItem Objects/abstract.c:158 (python+0x1504c1)
    #4 _PyEval_EvalFrameDefault Python/generated_cases.c.h:62 (python+0x41b75c)
    #5 _PyEval_EvalFrame Include/internal/pycore_ceval.h:116 (python+0x46db5c)
    #6 _PyEval_Vector Python/ceval.c:1820 (python+0x46db5c)
    #7 _PyFunction_Vectorcall Objects/call.c:413 (python+0x187eb4)
    #8 _PyObject_VectorcallTstate Include/internal/pycore_call.h:167 (python+0x18fb2c)
    #9 method_vectorcall Objects/classobject.c:72 (python+0x18fb2c)
    #10 _PyVectorcall_Call Objects/call.c:273 (python+0x18b9a7)
    #11 _PyObject_Call Objects/call.c:348 (python+0x18be9f)
    #12 PyObject_Call Objects/call.c:373 (python+0x18bf24)
    #13 thread_run Modules/_threadmodule.c:354 (python+0x668b93)
    #14 pythread_wrapper Python/thread_pthread.h:242 (python+0x57d1ea)

Is this an issue?

CPython versions tested on:

3.14

Operating systems tested on:

Linux

@tom-pytel tom-pytel added the type-bug An unexpected behavior, bug, or error label Mar 8, 2025
@tom-pytel
Copy link
Contributor Author

ping @colesbury

@picnixz picnixz added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading labels Mar 8, 2025
@da-woods
Copy link
Contributor

da-woods commented Mar 8, 2025

The buffer protocol is really to give external C-like code fast access to C array data. At least in C code it was always possible to work on a buffer with the GIL released. So it was never something where Python enforced the thread safety.

I can't imagine that accessing it through memoryview and Python can really be made usefully thread-safe either because it doesn't know anything about what assumptions other users of the buffer are making.

There has been some talk of adding Rust-style borrow-checked buffers (which would have many readers and 0 writers, or 0 readers and 1 writer) which would be safe, but that would be new feature.

@tom-pytel
Copy link
Contributor Author

tom-pytel commented Mar 9, 2025

I can't imagine that accessing it through memoryview and Python can really be made usefully thread-safe

Not thread-safe, but free from data races, which I understand to be a goal of free-threaded python. Would have to make all RW-buffer accesses atomic?

@da-woods
Copy link
Contributor

da-woods commented Mar 9, 2025

Fair enough - it sounds like the main value in this would be keeping address sanitizer quiet more than anything else. But there may be a genuine value in that.

@colesbury
Copy link
Contributor

I haven't looked at this in depth, but at first glance I agree with @da-woods comment above.

@tom-pytel
Copy link
Contributor Author

tom-pytel commented Mar 10, 2025

Alright, but if going this route, some questions:

I am not aware of an atomic way to do memcpy other than to break it up into multiple individual element atomic copies, and I haven't found something like this in the codebase, or am I missing something? EDIT: "atomic" in terms of not triggering TSAN, not that the entire block copy will be atomic.

If giving all buffer accesses this treatment (and other things) then this will be common functionality. Should be python-wide accessible? FT_ATOMIC_MEMCPY_RELAXED / FT_ATOMIC_DEST_MEMCPY_RELAXED / FT_ATOMIC_SOURCE_MEMCPY_RELAXED / FT_ATOMIC_MEMSET_RELAXED / etc... (also operations with known element size)? Should be its own PR?

@sergey-miryanov
Copy link
Contributor

FT_ATOMIC_MEMCPY_RELAXED / FT_ATOMIC_DEST_MEMCPY_RELAXED / FT_ATOMIC_SOURCE_MEMCPY_RELAXED / FT_ATOMIC_MEMSET_RELAXED

You can't do atomic memcpy. Even if you lock the buffer pointer, you can't stop other thread from writing to memory using this pointer (sorry if this obvious).

@tom-pytel
Copy link
Contributor Author

FT_ATOMIC_MEMCPY_RELAXED / FT_ATOMIC_DEST_MEMCPY_RELAXED / FT_ATOMIC_SOURCE_MEMCPY_RELAXED / FT_ATOMIC_MEMSET_RELAXED

You can't do atomic memcpy. Even if you lock the buffer pointer, you can't stop other thread from writing to memory using this pointer (sorry if this obvious).

Yeah, sorry, I wasn't being clear, see the EDIT above - "atomic" in terms of not triggering TSAN, not that the entire block copy will be atomic - so copying memory using relaxed atomic load/store ops.

@sergey-miryanov
Copy link
Contributor

I'm afraid it will be very slow in this case.

@tom-pytel
Copy link
Contributor Author

It has already been explained to me that not all data races are to be eliminated so this discussion is a bit moot, but to clear up the remaining points:

  • It wouldn't really be slow, in fact the timings I did show same speed as vanilla memcpy, but keep in mind this is with memory order "relaxed", so no memory barriers or cache flushes and the compiler goes wild with optimizations.

  • And this would literally only exist to shut up TSAN, the purpose being that genuine bad data races would stand out, which can also be done with suppression file, but I guess code is more permanent.

In any case, like I said for now not all data races are to be squashed so no "atomic" memcpy.

@sergey-miryanov
Copy link
Contributor

  • It wouldn't really be slow, in fact the timings I did show same speed as vanilla memcpy, but keep in mind this is with memory order "relaxed", so no memory barriers or cache flushes and the compiler goes wild with optimizations.

Could you please share a link to the timings? I'm not sure I saw it.

@tom-pytel
Copy link
Contributor Author

Could you please share a link to the timings? I'm not sure I saw it.

This is in the context of another PR and rough timings but here: #130771 (comment)

Also note what the "atomic" memcpy compiles to further down, a cache line aligned block of 8 byte wide load/store instructions.

@sergey-miryanov
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants