Skip to content

gh-127065: Make methodcaller thread-safe and re-entrant #127245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Nov 25, 2024

The function operator.methodcaller is not thread-safe since the additional of the vectorcall method in ##107201. In the free-threading build the issue is easy to trigger, for the normal build harder.

We could either remove the vectorcall implementation, or adapt the vectorcall implementation to be thread safe (for both the normal and free-threading build). In this PR we make the methodcaller safe by

  • Replacing the lazy initialiation with initialization in the constructor
  • Using a stack allocated space for the vectorcall arguments, and falling back to tp_call for more than 8 arguments

Benchmark results:

call: Mean +- std dev: [no_vectorcall] 353 ns +- 8 ns -> [pr] 149 ns +- 5 ns: 2.37x faster
creation: Mean +- std dev: [no_vectorcall] 317 ns +- 8 ns -> [pr] 439 ns +- 31 ns: 1.39x slower
creation+call: Mean +- std dev: [no_vectorcall] 652 ns +- 17 ns -> [pr] 613 ns +- 14 ns: 1.06x faster
call kwarg: Mean +- std dev: [no_vectorcall] 599 ns +- 88 ns -> [pr] 199 ns +- 6 ns: 3.01x faster
creation kwarg: Mean +- std dev: [no_vectorcall] 451 ns +- 4 ns -> [pr] 653 ns +- 59 ns: 1.45x slower
creation+call kwarg: Mean +- std dev: [no_vectorcall] 1.05 us +- 0.02 us -> [pr] 858 ns +- 9 ns: 1.22x faster

Geometric mean: 1.29x faster

(old machine, no PGO, comparing this PR with the same PR but the vectorcall disabled)

Benchmark script
import pyperf

setup = """
from operator import methodcaller as mc
arr = []
call = mc('sort')
call_kwarg = mc('sort', reverse=True)
"""

runner = pyperf.Runner()
runner.timeit(name="call", stmt="call(arr)", setup=setup)
runner.timeit(name="creation", stmt="call = mc('sort')", setup=setup)
runner.timeit(name="creation+call", stmt="call = mc('sort'); call(arr)", setup=setup)
runner.timeit(name="call kwarg", stmt="call_kwarg(arr)", setup=setup)
runner.timeit(name="creation kwarg", stmt="call = mc('sort', reverse=True)", setup=setup)
runner.timeit(name="creation+call kwarg", stmt="call = mc('sort', reverse=True); call(arr)", setup=setup)

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments:

  • I don't think _methodcaller_initialize_vectorcall is thread safe without the GIL. Lazily initializing arguments complicates the code and thread-safety -- I don't think it's worth it.
  • Generally, I think it's better to support some fixed max number of arguments (like 8), and stack allocate the temporary array. (i.e., PyObject **tmp_args[MAX_ARGS];). If the methodcaller needs more arguments than that, just don't set the vectorcall field and let it fall back to the tp_call.

(PyTuple_GET_SIZE(mc->xargs)) | PY_VECTORCALL_ARGUMENTS_OFFSET,
mc->vectorcall_kwnames);

PyMem_Free(tmp_args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyMem_Free is after the return statement

@eendebakpt eendebakpt changed the title Draft: gh-127065: Make methodcaller thread-safe and re-entrant gh-127065: Make methodcaller thread-safe and re-entrant Dec 1, 2024
Comment on lines 1600 to 1601
PyObject **vectorcall_args; /* Borrowed references */
PyObject *vectorcall_kwnames;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ownership here seems a bit complicated and I think it can be simplified. As I understand it, vectorcall_kwnames only exists to ensure that some entries in vectorcall_args stay alive.

Instead, I'd suggest:

  • Make PyObject *vectorcall_args a tuple (that holds strong references to its contents as usual)
  • Get rid of vectorcall_kwnames
  • Use _PyTuple_ITEMS for fast access to the contents of the tuple (for memcpy())
  • Visit vectorcall_args in methodcaller_traverse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vectorcall_kwnames is needed as an argument for PyObject_VectorcallMethod in methodcaller_vectorcall (https://github.com/python/cpython/blob/main/Modules/_operator.c#L1666), so we cannot get rid of it.

The ownership is not too hard I think: the objects in vectorcall_args have references borrowed from either mc->args or (the keys from) mc->kwds. I added a comment to clarify this.

Making the vectorcall_args a tuple is still an option though. It requires a bit more memory and a tiny bit of computation in the initialization. It would be the C equivalent of vectorcall_args = args + tuple(kwds). I'll work it out in a branch to see how it compares

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, in that case don't worry about it unless you prefer it as a tuple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff between the two approaches is this:

eendebakpt/cpython@methodcaller_ft...eendebakpt:cpython:methodcaller_ft_v2

What is nice about making vectorcall_args a tuple is that if there are no keyword arguments, we can reuse mc->args. It does require more operations in the construction though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either approach is fine! My guess is that the vast majority of uses of methodcaller() do not use keyword arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running benchmarks shows what is to be expected from the implementations: using a tuple for vectorcall_args is a bit slower in initializing, except when there are no keyword arguments (since then we reuse the arg tuple). Differences are small though.

Since using a tuple leads to cleaner code and the majority of uses is without keywords I slightly prefer the tuple approach. I will open a new PR for it.

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me. A few minor formatting suggestions below.

Comment on lines 1609 to 1610
methodcaller_vectorcall(
methodcallerobject *mc, PyObject *const *args, size_t nargsf, PyObject* kwnames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more common formatting looks like:

methodcaller_vectorcall(methodcallerobject *mc, PyObject *const *args,
                        size_t nargsf, PyObject *kwnames)

assert(1 + number_of_arguments <= _METHODCALLER_MAX_ARGS);
memcpy(tmp_args + 1, mc->vectorcall_args, sizeof(PyObject *) * number_of_arguments);

PyObject *result = PyObject_VectorcallMethod(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you, but I'd write this without the temporary result variable like:

return PyObject_VectorcallMethod(...);


return result;
}

static int _methodcaller_initialize_vectorcall(methodcallerobject* mc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static int _methodcaller_initialize_vectorcall(methodcallerobject* mc)
static int
_methodcaller_initialize_vectorcall(methodcallerobject *mc)

eendebakpt and others added 4 commits December 6, 2024 21:53
Co-authored-by: Sam Gross <colesbury@gmail.com>
Co-authored-by: Sam Gross <colesbury@gmail.com>
@eendebakpt
Copy link
Contributor Author

Closing in favor of #127746

@eendebakpt eendebakpt closed this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants