-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-97016: Convert PyBytes_AS_STRING() to function #97017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Convert the following static inline functions to regular functions: * PyByteArray_AS_STRING() * PyByteArray_GET_SIZE() * PyBytes_AS_STRING() * PyBytes_GET_SIZE() Remove the _PyByteArray_empty_string variable. It was excluded from the limited C API. In bytesobject.c and bytearrayobject.c, add static inline functions and use them for best performance: * _PyByteArray_AS_STRING() * _PyByteArray_GET_SIZE() * _PyBytes_AS_STRING() * _PyBytes_GET_SIZE()
@erlend-aasland: Would you mind to review this PR? With this change, _Py_CAST() is still used by |
Micro-benchmark on PyBytes_AS_STRING() and PyBytes_GET_SIZE() function calls to measure the worst case: when the hot code is just these two function calls.
In average, converting PyBytes_AS_STRING() and PyBytes_GET_SIZE() static inline functions to regular functions makes them 1.25x slower, it adds +2.7 ns per function call (+5.4 ns for the two function calls). Well, currently This worst case is when PyBytes_AS_STRING() and PyBytes_GET_SIZE() cannot be inlined by LTO: I added the benchmark to _testcapi which is built as a shared library, so they are regular function calls (cannot be inlined by LTO). Benchmark run on Python built with LTO (without PGO) with CPU pinning. In short, my micro-benchmark loops on this code: PyObject *bytes = PyBytes_FromString("abc");
char *s = PyBytes_AS_STRING(bytes);
Py_ssize_t len = PyBytes_GET_SIZE(bytes);
Py_DECREF(bytes); |
I compared this PR to its parent commit. Commands:
Linux booted with |
I don't expect these functions to be called in hot code. Usually, you get the pointer, and then work on the pointer in a loop. I expect the overhead to not be significant in practice. |
An alternative is to get rid of the static inline implementation and creates aliases to existing functions: #define PyByteArray_AS_STRING PyByteArray_AsString
#define PyByteArray_GET_SIZE PyByteArray_GetSize
#define PyBytes_AS_STRING PyBytes_AsString
#define PyBytes_GET_SIZE PyBytes_GetSize These functions check the type of their argument, whereas the current static inline functions don't. Example: char *
PyBytes_AsString(PyObject *op)
{
if (!PyBytes_Check(op)) {
PyErr_Format(PyExc_TypeError,
"expected bytes, %.200s found", Py_TYPE(op)->tp_name);
return NULL;
}
return ((PyBytesObject *)op)->ob_sval;
} |
This change is somehow controversial, so I prefer to mark is as a draft for now :-) |
The benefits of this change are not obvious since Python ABI is still unstable, whereas this change introduces a performance overflow. Moreover, there are already these functions:
|
Convert the following static inline functions to regular functions:
Remove the _PyByteArray_empty_string variable. It was excluded from the limited C API.
In bytesobject.c and bytearrayobject.c, add static inline functions and use them for best performance: