Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

methodcaller is not thread-safe (or re-entrant) #127065

Open
colesbury opened this issue Nov 20, 2024 · 3 comments
Open

methodcaller is not thread-safe (or re-entrant) #127065

colesbury opened this issue Nov 20, 2024 · 3 comments
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@colesbury
Copy link
Contributor

colesbury commented Nov 20, 2024

Bug report

EDIT: edited to clarify that the issue is in the C implementation of operator.methodcaller.

Originally reported by @ngoldbaum in crate-py/rpds#101

Reproducer
from operator import methodcaller

from concurrent.futures import ThreadPoolExecutor

class HashTrieMap():
    def keys(self):
        return None
    def values(self):
        return None
    def items(self):
        return None

num_workers=1000

views = [methodcaller(p) for p in ["keys", "values", "items"]]

def work(view):
    m, d = HashTrieMap(), {}
    view(m)
    view(d)

iterations = 10

for _ in range(iterations):
    
    executor = ThreadPoolExecutor(max_workers=num_workers)

    for view in views:
        futures = [executor.submit(work, view) for _ in range(num_workers)]
        results = [future.result() for future in futures]

Once every 5-10 runs, the program prints:

TypeError: descriptor 'keys' for 'dict' objects doesn't apply to a 'HashTrieMap' object

The problem is that operator.methodcaller is not thread-safe because it modifies the vectorcall_args, which is shared across calls:

cpython/Modules/_operator.c

Lines 1646 to 1666 in 0af4ec3

static PyObject *
methodcaller_vectorcall(
methodcallerobject *mc, PyObject *const *args, size_t nargsf, PyObject* kwnames)
{
if (!_PyArg_CheckPositional("methodcaller", PyVectorcall_NARGS(nargsf), 1, 1)
|| !_PyArg_NoKwnames("methodcaller", kwnames)) {
return NULL;
}
if (mc->vectorcall_args == NULL) {
if (_methodcaller_initialize_vectorcall(mc) < 0) {
return NULL;
}
}
assert(mc->vectorcall_args != 0);
mc->vectorcall_args[0] = args[0];
return PyObject_VectorcallMethod(
mc->name, mc->vectorcall_args,
(PyTuple_GET_SIZE(mc->xargs)) | PY_VECTORCALL_ARGUMENTS_OFFSET,
mc->vectorcall_kwnames);
}

I think this is generally unsafe, not just for free threading. The vectorcall args array needs to be valid for the duration of the call, and it's possible for methodcaller to be called reentrantly or by another thread while the call is still ongoing.

Linked PRs

@colesbury colesbury added type-bug An unexpected behavior, bug, or error 3.13 bugs and security fixes topic-free-threading 3.14 new features, bugs and security fixes labels Nov 20, 2024
@colesbury
Copy link
Contributor Author

Hmm, actually it looks like a thread safety issue in methodcaller.

@ngoldbaum
Copy link
Contributor

Yeah, I noticed early it wasn't triggering if I called the descriptors directly without going through methodcaller.

@colesbury colesbury changed the title Free threading race condition involving descriptor lookup methodcaller is not thread-safe (or re-entrant) Nov 20, 2024
colesbury added a commit to colesbury/cpython that referenced this issue Nov 21, 2024
The `methodcaller` C vectorcall implementation uses an arguments array
that is shared across calls. The first argument is modified on every
invocation. This isn't thread-safe in the free threading build. I think
it's also not safe in general, but for now just disable it in the free
threading build.
@colesbury
Copy link
Contributor Author

colesbury commented Nov 21, 2024

I think the optimized vectorcall implementation from #107201 is not reentrant or thread-safe (even with the GIL) because the shared vectorcall_args array is modified during each invocation:

mc->vectorcall_args[0] = args[0];

Most (or all) of our current vectorcall implementations extract out args[0] early, but that's not guaranteed and the vectorcall protocol is a public API.

For now, I'm planning to just disable the optimization in the free threading build (#127109), but I don't see an easy way of making it thread-safe, and I think we should consider reverting #107201 in 3.13 and main.

cc @eendebakpt @corona10 @vstinner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants