-
-
Notifications
You must be signed in to change notification settings - Fork 33.1k
gh-139888: Add PyTupleWriter C API #139891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* Add _PyTuple_NewNoTrack() and _PyTuple_ResizeNoTrack() helper functions. * Modify PySequence_Tuple() to use PyTupleWriter API. * Soft deprecate _PyTuple_Resize().
Please don't add any APIs for tracking. Tracking, or untracking, is the job of the VM. We might not even have tracking in the future. FT already tracks objects differently. Deprecate I think the most useful new API we could add is |
Are you talking about |
For now, I prefer to only soft deprecate it. It's documented and used by too many C extensions.
Well, I'm open to soft deprecate it. But deprecating it would affect too many C extensions IMO.
We might add PyTupleWriter is mostly useful when you don't know the tuple size in advance. For example, when you consume an iterator. |
Where is the API specified? It seems rather inefficient, needing to heap allocate the writer. There should be no need for a method to create a tuple writer, it can be a small object that can stack allocated and be zero initialized.
It also needs a function to consume the reference of the item, like
Maybe add bulk adds as well?
|
It is unfortunate that so many extensions use it, but it is still broken. The sooner we deprecate it, the better, as we can give people more warning. We do need a good story for how to replace it. |
If you are consuming an iterator, |
I see the value in this as a nice, safe replacement for the |
The API is: PyTupleWriter* PyTupleWriter_Create(Py_ssize_t size);
int PyTupleWriter_Add(PyTupleWriter *writer, PyObject *item);
PyObject* PyTupleWriter_Finish(PyTupleWriter *writer);
void PyTupleWriter_Discard(PyTupleWriter *writer);
I designed the API to be compatible with the stable ABI later. So the writer is allocated on the heap to hide the structure members from the public C API. The implementation uses a free list which makes the allocation basically free in terms of performance.
I can add
That sounds like a good idea, it would be similar to PyTuple_FromArray(). |
As long as setting all the fields to zero initializes it, then only the size need be fixed.
That's not true. Free lists can have poor locality of reference, and the code can be quite branchy. Plus there's the overhead of the function call. |
I updated the PR to add |
Change also the exception to SystemError for this error.
UPDATE: There was a bug in my benchmark. I fixed it and reran the benchmark. Now it's faster instead of slower for tuple-1000 😁 Benchmark comparing
tuple-1 is the worst case scenario, measure the overhead of the abstraction: it's only 3.9 nanoseconds slower. Benchmark: Patch: diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
index 4e73be20e1b..27c3c02c7fc 100644
--- a/Modules/_testcapimodule.c
+++ b/Modules/_testcapimodule.c
@@ -2562,6 +2562,76 @@ toggle_reftrace_printer(PyObject *ob, PyObject *arg)
Py_RETURN_NONE;
}
+static PyObject *
+bench_tuple(PyObject *ob, PyObject *args)
+{
+ Py_ssize_t size, loops;
+ if (!PyArg_ParseTuple(args, "nn", &size, &loops)) {
+ return NULL;
+ }
+
+ PyTime_t t1, t2;
+ PyTime_PerfCounterRaw(&t1);
+ for (Py_ssize_t i=0; i < loops; i++) {
+ PyObject *tuple = PyTuple_New(size);
+ if (tuple == NULL) {
+ return NULL;
+ }
+
+ for (int i=0; i < size; i++) {
+ PyObject *item = PyLong_FromLong(i);
+ if (item == NULL) {
+ return NULL;
+ }
+ if (PyTuple_SetItem(tuple, i, item) < 0) {
+ Py_DECREF(tuple);
+ return NULL;
+ }
+ }
+
+ Py_DECREF(tuple);
+ }
+ PyTime_PerfCounterRaw(&t2);
+ return PyFloat_FromDouble(PyTime_AsSecondsDouble(t2 - t1));
+}
+
+static PyObject *
+bench_writer(PyObject *ob, PyObject *args)
+{
+ Py_ssize_t size, loops;
+ if (!PyArg_ParseTuple(args, "nn", &size, &loops)) {
+ return NULL;
+ }
+
+ PyTime_t t1, t2;
+ PyTime_PerfCounterRaw(&t1);
+ for (Py_ssize_t i=0; i < loops; i++) {
+ PyTupleWriter *writer = PyTupleWriter_Create(size);
+ if (writer == NULL) {
+ return NULL;
+ }
+
+ for (int i=0; i < size; i++) {
+ PyObject *item = PyLong_FromLong(i);
+ if (item == NULL) {
+ return NULL;
+ }
+ if (PyTupleWriter_AddSteal(writer, item) < 0) {
+ PyTupleWriter_Discard(writer);
+ return NULL;
+ }
+ }
+
+ PyObject *tuple = PyTupleWriter_Finish(writer);
+ if (tuple == NULL) {
+ return NULL;
+ }
+ Py_DECREF(tuple);
+ }
+ PyTime_PerfCounterRaw(&t2);
+ return PyFloat_FromDouble(PyTime_AsSecondsDouble(t2 - t1));
+}
+
static PyMethodDef TestMethods[] = {
{"set_errno", set_errno, METH_VARARGS},
{"test_config", test_config, METH_NOARGS},
@@ -2656,6 +2726,8 @@ static PyMethodDef TestMethods[] = {
{"test_atexit", test_atexit, METH_NOARGS},
{"code_offset_to_line", _PyCFunction_CAST(code_offset_to_line), METH_FASTCALL},
{"toggle_reftrace_printer", toggle_reftrace_printer, METH_O},
+ {"bench_tuple", bench_tuple, METH_VARARGS},
+ {"bench_writer", bench_writer, METH_VARARGS},
{NULL, NULL} /* sentinel */
};
import pyperf
import _testcapi
import functools
runner = pyperf.Runner()
for size in (1, 5, 10, 100, 1000):
func = functools.partial(_testcapi.bench_tuple, size)
runner.bench_time_func(f'tuple-{size}', func)
import pyperf
import _testcapi
import functools
runner = pyperf.Runner()
for size in (1, 5, 10, 100, 1000):
func = functools.partial(_testcapi.bench_writer, size)
runner.bench_time_func(f'tuple-{size}', func) |
Opposition posted on the issue. |
The API is being actively discussed. And I'm still making changes in the API. So I prefer to mark this PR as a draft for now. |
To make the API nicer to user, I propose to accept It allows replacing code like: PyObject *item = PyLong_FromSsize_t(value);
if (!item)
goto error;
PyTuple_SET_ITEM(tuple, 0, item); with: PyObject *item = PyLong_FromSsize_t(value);
if (PyTupleWriter_AddSteal(tuple, item) < 0) {
goto error;
} instead of having to check for error twice: PyObject *item = PyLong_FromSsize_t(value);
if (!item) {
goto error;
}
if (PyTupleWriter_AddSteal(tuple, item) < 0) {
goto error;
} Checking a function return value is a common pattern when creating a tuple. |
Add private _PyTupleWriter_GetItems() helper function.
I changed the allocation strategy which makes the benchmark faster for tuple-10 and reduces the overhead for tuple-1 and tuple-5:
|
Micro-benchmark on import pyperf
runner = pyperf.Runner()
for size in (1, 5, 10, 50, 100, 1_000, 10_000):
runner.timeit(f'tuple-{size:,}',
setup=f'from _testlimitedcapi import sequence_tuple; seq = range({size})',
stmt='sequence_tuple(seq)')
|
PyTupleWriter made the function slower.
Mark asked to redo the benchmark to compare
Note: |
PyTupleWriter
API #139888📚 Documentation preview 📚: https://cpython-previews--139891.org.readthedocs.build/