Skip to content

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Oct 10, 2025

  • Add _PyTuple_NewNoTrack() and _PyTuple_ResizeNoTrack() helper functions.
  • Modify PySequence_Tuple() to use PyTupleWriter API.
  • Soft deprecate _PyTuple_Resize().

📚 Documentation preview 📚: https://cpython-previews--139891.org.readthedocs.build/

* Add _PyTuple_NewNoTrack() and _PyTuple_ResizeNoTrack() helper
  functions.
* Modify PySequence_Tuple() to use PyTupleWriter API.
* Soft deprecate _PyTuple_Resize().
@markshannon
Copy link
Member

Please don't add any APIs for tracking. Tracking, or untracking, is the job of the VM. We might not even have tracking in the future. FT already tracks objects differently.

Deprecate _PyTuple_Resize() as hard as you like 🙂; it is nonsense as should removed as soon as possible.
Please deprecate PyTuple_New as well.

I think the most useful new API we could add is PyTuple_MakePair(). Making a tuple from two objects is very common.

@vstinner
Copy link
Member Author

Please don't add any APIs for tracking. Tracking, or untracking, is the job of the VM. We might not even have tracking in the future. FT already tracks objects differently.

Are you talking about _PyTuple_NewNoTrack() and _PyTuple_ResizeNoTrack()? These functions are not usable outside tupleobject.c, they are declared as static.

@vstinner
Copy link
Member Author

Deprecate _PyTuple_Resize() as hard as you like 🙂; it is nonsense as should removed as soon as possible.

For now, I prefer to only soft deprecate it. It's documented and used by too many C extensions.

Please deprecate PyTuple_New as well.

Well, I'm open to soft deprecate it. But deprecating it would affect too many C extensions IMO.

I think the most useful new API we could add is PyTuple_MakePair(). Making a tuple from two objects is very common.

We might add PyTuple_Pack2() function. But that should be a separated issue.

PyTupleWriter is mostly useful when you don't know the tuple size in advance. For example, when you consume an iterator.

@markshannon
Copy link
Member

Where is the API specified? It seems rather inefficient, needing to heap allocate the writer.
It should to be as efficient as possible, or we won't be able to persuade people to switch away from using PyTuple_New.

There should be no need for a method to create a tuple writer, it can be a small object that can stack allocated and be zero initialized.

    PyTupleWriter writer = { 0 };

It also needs a function to consume the reference of the item, like PyTuple_SETITEM but safer.

    PyTupleWriter_AddConsumeRef(&writer, item);

Maybe add bulk adds as well?

    PyTupleWriter_AddArray(PyTupleWriter *writer, PyObject **array, intptr_t count);

@markshannon
Copy link
Member

Well, I'm open to soft deprecate it. But deprecating it would affect too many C extensions IMO.

It is unfortunate that so many extensions use it, but it is still broken. The sooner we deprecate it, the better, as we can give people more warning. We do need a good story for how to replace it.

@markshannon
Copy link
Member

markshannon commented Oct 10, 2025

PyTupleWriter is mostly useful when you don't know the tuple size in advance. For example, when you consume an iterator.

If you are consuming an iterator, PySequence_Tuple is much simpler thanPyTupleWriter.
TBH, if you're interacting with Python objects at that level your best option is probably Python not C.

@markshannon
Copy link
Member

I see the value in this as a nice, safe replacement for the PyTuple_New PyTuple_SET_ITEM combo.
So the API needs to be efficient, and easy to port to.

@vstinner
Copy link
Member Author

vstinner commented Oct 10, 2025

Where is the API specified? It seems rather inefficient, needing to heap allocate the writer.

The API is:

PyTupleWriter* PyTupleWriter_Create(Py_ssize_t size);
int PyTupleWriter_Add(PyTupleWriter *writer, PyObject *item);
PyObject* PyTupleWriter_Finish(PyTupleWriter *writer);
void PyTupleWriter_Discard(PyTupleWriter *writer);

PyTupleWriter_Add() creates a new reference, it doesn't take the ownership of item.

It seems rather inefficient, needing to heap allocate the writer.

I designed the API to be compatible with the stable ABI later. So the writer is allocated on the heap to hide the structure members from the public C API.

The implementation uses a free list which makes the allocation basically free in terms of performance.

It also needs a function to consume the reference of the item, like PyTuple_SETITEM but safer.

I can add int PyTupleWriter_AddSteal(PyTupleWriter *writer, PyObject *item) variant which takes the ownership of the item. The C API Working Group recently expressed its preference for the Steal term for such API.

Maybe add bulk adds as well?

That sounds like a good idea, it would be similar to PyTuple_FromArray().

@markshannon
Copy link
Member

So the writer is allocated on the heap to hide the structure members from the public C API.

As long as setting all the fields to zero initializes it, then only the size need be fixed.

The implementation uses a free list which makes the allocation basically free in terms of performance.

That's not true. Free lists can have poor locality of reference, and the code can be quite branchy. Plus there's the overhead of the function call.

@vstinner
Copy link
Member Author

I updated the PR to add PyTupleWriter_AddSteal() and PyTupleWriter_AddArray() functions, and hard deprecate _PyTuple_Resize().

@vstinner
Copy link
Member Author

vstinner commented Oct 10, 2025

UPDATE: There was a bug in my benchmark. I fixed it and reran the benchmark. Now it's faster instead of slower for tuple-1000 😁

Benchmark comparing tuple to writer:

  • tuple: PyTuple_New() and PyTuple_SetItem()
  • writer: PyTupleWriter_Create(), PyTupleWriter_AddSteal() and PyTupleWriter_Finish().
Benchmark tuple writer
tuple-1 37.4 ns 41.3 ns: 1.10x slower
tuple-5 65.7 ns 68.8 ns: 1.05x slower
tuple-10 99.9 ns 102 ns: 1.02x slower
tuple-100 800 ns 762 ns: 1.05x faster
tuple-1000 7.68 us 7.28 us: 1.05x faster
Geometric mean (ref) 1.01x slower

tuple-1 is the worst case scenario, measure the overhead of the abstraction: it's only 3.9 nanoseconds slower.


Benchmark:

Patch:

diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
index 4e73be20e1b..27c3c02c7fc 100644
--- a/Modules/_testcapimodule.c
+++ b/Modules/_testcapimodule.c
@@ -2562,6 +2562,76 @@ toggle_reftrace_printer(PyObject *ob, PyObject *arg)
     Py_RETURN_NONE;
 }
 
+static PyObject *
+bench_tuple(PyObject *ob, PyObject *args)
+{
+    Py_ssize_t size, loops;
+    if (!PyArg_ParseTuple(args, "nn", &size, &loops)) {
+        return NULL;
+    }
+
+    PyTime_t t1, t2;
+    PyTime_PerfCounterRaw(&t1);
+    for (Py_ssize_t i=0; i < loops; i++) {
+        PyObject *tuple = PyTuple_New(size);
+        if (tuple == NULL) {
+            return NULL;
+        }
+
+        for (int i=0; i < size; i++) {
+            PyObject *item = PyLong_FromLong(i);
+            if (item == NULL) {
+                return NULL;
+            }
+            if (PyTuple_SetItem(tuple, i, item) < 0) {
+                Py_DECREF(tuple);
+                return NULL;
+            }
+        }
+
+        Py_DECREF(tuple);
+    }
+    PyTime_PerfCounterRaw(&t2);
+    return PyFloat_FromDouble(PyTime_AsSecondsDouble(t2 - t1));
+}
+
+static PyObject *
+bench_writer(PyObject *ob, PyObject *args)
+{
+    Py_ssize_t size, loops;
+    if (!PyArg_ParseTuple(args, "nn", &size, &loops)) {
+        return NULL;
+    }
+
+    PyTime_t t1, t2;
+    PyTime_PerfCounterRaw(&t1);
+    for (Py_ssize_t i=0; i < loops; i++) {
+        PyTupleWriter *writer = PyTupleWriter_Create(size);
+        if (writer == NULL) {
+            return NULL;
+        }
+
+        for (int i=0; i < size; i++) {
+            PyObject *item = PyLong_FromLong(i);
+            if (item == NULL) {
+                return NULL;
+            }
+            if (PyTupleWriter_AddSteal(writer, item) < 0) {
+                PyTupleWriter_Discard(writer);
+                return NULL;
+            }
+        }
+
+        PyObject *tuple = PyTupleWriter_Finish(writer);
+        if (tuple == NULL) {
+            return NULL;
+        }
+        Py_DECREF(tuple);
+    }
+    PyTime_PerfCounterRaw(&t2);
+    return PyFloat_FromDouble(PyTime_AsSecondsDouble(t2 - t1));
+}
+
 static PyMethodDef TestMethods[] = {
     {"set_errno",               set_errno,                       METH_VARARGS},
     {"test_config",             test_config,                     METH_NOARGS},
@@ -2656,6 +2726,8 @@ static PyMethodDef TestMethods[] = {
     {"test_atexit", test_atexit, METH_NOARGS},
     {"code_offset_to_line", _PyCFunction_CAST(code_offset_to_line), METH_FASTCALL},
     {"toggle_reftrace_printer", toggle_reftrace_printer, METH_O},
+    {"bench_tuple", bench_tuple, METH_VARARGS},
+    {"bench_writer", bench_writer, METH_VARARGS},
     {NULL, NULL} /* sentinel */
 };
 

bench_tuple.py:

import pyperf
import _testcapi
import functools
runner = pyperf.Runner()
for size in (1, 5, 10, 100, 1000):
    func = functools.partial(_testcapi.bench_tuple, size)
    runner.bench_time_func(f'tuple-{size}', func)

bench_writer.py:

import pyperf
import _testcapi
import functools
runner = pyperf.Runner()
for size in (1, 5, 10, 100, 1000):
    func = functools.partial(_testcapi.bench_writer, size)
    runner.bench_time_func(f'tuple-{size}', func)

@zooba
Copy link
Member

zooba commented Oct 10, 2025

Opposition posted on the issue.

@sergey-miryanov
Copy link
Contributor

JFYI, tuple size distribution (from pyperformance):
image
By X - size of the tuple
By Y - percent of tuples with this size over all tuples

source_dataset.csv

plot_dataset.csv

@vstinner vstinner marked this pull request as draft October 11, 2025 21:07
@vstinner
Copy link
Member Author

The API is being actively discussed. And I'm still making changes in the API. So I prefer to mark this PR as a draft for now.

@vstinner
Copy link
Member Author

To make the API nicer to user, I propose to accept NULL in PyTupleWriter_Add() and PyTupleWriter_AddSteal(): return -1 (error) in this case.

It allows replacing code like:

    PyObject *item = PyLong_FromSsize_t(value);
    if (!item)
        goto error;
    PyTuple_SET_ITEM(tuple, 0, item);

with:

    PyObject *item = PyLong_FromSsize_t(value);
    if (PyTupleWriter_AddSteal(tuple, item) < 0) {
        goto error;
    }

instead of having to check for error twice:

    PyObject *item = PyLong_FromSsize_t(value);
    if (!item) {
        goto error;
    }
    if (PyTupleWriter_AddSteal(tuple, item) < 0) {
        goto error;
    }

Checking a function return value is a common pattern when creating a tuple.

Add private _PyTupleWriter_GetItems() helper function.
@vstinner
Copy link
Member Author

I changed the allocation strategy which makes the benchmark faster for tuple-10 and reduces the overhead for tuple-1 and tuple-5:

Benchmark tuple writer
tuple-1 37.3 ns 40.0 ns: 1.07x slower
tuple-5 65.2 ns 67.1 ns: 1.03x slower
tuple-10 99.7 ns 98.8 ns: 1.01x faster
tuple-100 807 ns 761 ns: 1.06x faster
tuple-1000 7.68 us 7.29 us: 1.05x faster
Geometric mean (ref) 1.00x faster

@vstinner
Copy link
Member Author

Micro-benchmark on PySequence_Tuple():

import pyperf
runner = pyperf.Runner()
for size in (1, 5, 10, 50, 100, 1_000, 10_000):
    runner.timeit(f'tuple-{size:,}',
        setup=f'from _testlimitedcapi import sequence_tuple; seq = range({size})',
        stmt='sequence_tuple(seq)')
Benchmark ref writer
tuple-1 129 ns 101 ns: 1.27x faster
tuple-5 132 ns 134 ns: 1.01x slower
tuple-10 218 ns 179 ns: 1.22x faster
tuple-50 753 ns 829 ns: 1.10x slower
tuple-1,000 11.3 us 10.7 us: 1.05x faster
tuple-10,000 260 us 256 us: 1.02x faster
Geometric mean (ref) 1.06x faster

PyTupleWriter made the function slower.
@vstinner
Copy link
Member Author

Mark asked to redo the benchmark to compare PyTuple_SET_ITEM() to PyTupleWriter_AddSteal(). Here you have:

Benchmark tuple writer
tuple-1 33.0 ns 42.4 ns: 1.28x slower
tuple-5 51.4 ns 70.3 ns: 1.37x slower
tuple-10 74.1 ns 105 ns: 1.42x slower
tuple-100 567 ns 802 ns: 1.41x slower
tuple-1000 5.43 us 7.71 us: 1.42x slower
Geometric mean (ref) 1.38x slower

PyTupleWriter_AddSteal() is 1.28x to 1.42x slower than PyTuple_SET_ITEM().

Note: PyTuple_SET_ITEM() is not available in the limited C API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants