Skip to content

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Oct 8, 2025

@vstinner vstinner marked this pull request as draft October 11, 2025 21:57
@vstinner
Copy link
Member Author

I convert this PR to a draft for now since it seems like the API is misused by 3rd party projects, and I proposed PyDict_FromItems() which is a different abstraction: #139963

@vstinner vstinner force-pushed the dict_presized branch 3 times, most recently from eb555c6 to 8bb9715 Compare October 12, 2025 12:40
@vstinner
Copy link
Member Author

I rewrote the PR to add unicode_keys parameters: PyObject* PyDict_NewPresized(Py_ssize_t size, int unicode_keys).

@methane
Copy link
Member

methane commented Oct 13, 2025

There are two news entries.

@vstinner
Copy link
Member Author

vstinner commented Oct 13, 2025

Benchmark on PyDict_New() vs PyDict_NewPresized() with Unicode keys:

Benchmark new presized
dict-10 2.69 us 2.62 us: 1.03x faster
dict-100 29.6 us 27.5 us: 1.08x faster
dict-1,000 301 us 283 us: 1.06x faster
dict-10,000 3.50 ms 3.18 ms: 1.10x faster
Geometric mean (ref) 1.05x faster

Benchmark hidden because not significant (1): dict-1

Code:

diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
index 4e73be20e1b..a1eaed01178 100644
--- a/Modules/_testcapimodule.c
+++ b/Modules/_testcapimodule.c
@@ -2562,6 +2562,77 @@ toggle_reftrace_printer(PyObject *ob, PyObject *arg)
     Py_RETURN_NONE;
 }
 
+
+static PyObject *
+bench_dict_new(PyObject *ob, PyObject *args)
+{
+    Py_ssize_t size, loops;
+    if (!PyArg_ParseTuple(args, "nn", &size, &loops)) {
+        return NULL;
+    }
+
+    PyTime_t t1, t2;
+    PyTime_PerfCounterRaw(&t1);
+    for (Py_ssize_t loop=0; loop < loops; loop++) {
+        PyObject *d = PyDict_New();
+        if (d == NULL) {
+            return NULL;
+        }
+
+        for (Py_ssize_t i=0; i < size; i++) {
+            PyObject *key = PyUnicode_FromFormat("%zi", i);
+            assert(key != NULL);
+
+            PyObject *value = PyLong_FromLong(i);
+            assert(value != NULL);
+
+            assert(PyDict_SetItem(d, key, value) == 0);
+        }
+
+        assert(PyDict_Size(d) == size);
+        Py_DECREF(d);
+    }
+    PyTime_PerfCounterRaw(&t2);
+
+    return PyFloat_FromDouble(PyTime_AsSecondsDouble(t2 - t1));
+}
+
+
+static PyObject *
+bench_dict_presized(PyObject *ob, PyObject *args)
+{
+    Py_ssize_t size, loops;
+    if (!PyArg_ParseTuple(args, "nn", &size, &loops)) {
+        return NULL;
+    }
+
+    PyTime_t t1, t2;
+    PyTime_PerfCounterRaw(&t1);
+    for (Py_ssize_t loop=0; loop < loops; loop++) {
+        PyObject *d = PyDict_NewPresized(size, 1);
+        if (d == NULL) {
+            return NULL;
+        }
+
+        for (Py_ssize_t i=0; i < size; i++) {
+            PyObject *key = PyUnicode_FromFormat("%zi", i);
+            assert(key != NULL);
+
+            PyObject *value = PyLong_FromLong(i);
+            assert(value != NULL);
+
+            assert(PyDict_SetItem(d, key, value) == 0);
+        }
+
+        assert(PyDict_Size(d) == size);
+        Py_DECREF(d);
+    }
+    PyTime_PerfCounterRaw(&t2);
+
+    return PyFloat_FromDouble(PyTime_AsSecondsDouble(t2 - t1));
+}
+
+
 static PyMethodDef TestMethods[] = {
     {"set_errno",               set_errno,                       METH_VARARGS},
     {"test_config",             test_config,                     METH_NOARGS},
@@ -2656,6 +2727,8 @@ static PyMethodDef TestMethods[] = {
     {"test_atexit", test_atexit, METH_NOARGS},
     {"code_offset_to_line", _PyCFunction_CAST(code_offset_to_line), METH_FASTCALL},
     {"toggle_reftrace_printer", toggle_reftrace_printer, METH_O},
+    {"bench_dict_new", bench_dict_new, METH_VARARGS},
+    {"bench_dict_presized", bench_dict_presized, METH_VARARGS},
     {NULL, NULL} /* sentinel */
 };
 

bench_new.py:

import pyperf
import functools
import _testcapi
runner = pyperf.Runner()
for size in (1, 10, 100, 1_000, 10_000):
    func = functools.partial(_testcapi.bench_dict_new, size)
    runner.bench_time_func(f'dict-{size:,}', func)

bench_presized.py:

import pyperf
import functools
import _testcapi
runner = pyperf.Runner()
for size in (1, 10, 100, 1_000, 10_000):
    func = functools.partial(_testcapi.bench_dict_presized, size)
    runner.bench_time_func(f'dict-{size:,}', func)

@vstinner
Copy link
Member Author

I created capi-workgroup/decisions#80 to the C API Working Group for this API.

@vstinner
Copy link
Member Author

Benchmark on PyDict_New() vs PyDict_NewPresized() with integer keys:

Benchmark new presized
dict-1 294 ns 301 ns: 1.02x slower
dict-10 2.61 us 2.51 us: 1.04x faster
dict-100 26.1 us 24.8 us: 1.05x faster
dict-1,000 260 us 250 us: 1.04x faster
dict-10,000 3.07 ms 2.78 ms: 1.10x faster
Geometric mean (ref) 1.04x faster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants