@@ -1317,14 +1317,17 @@ This section covers specific optimizations independent of the
1317
1317
Faster CPython
1318
1318
==============
1319
1319
1320
- CPython 3.11 is on average `25% faster <https://github.com/faster-cpython/ideas#published-results >`_
1321
- than CPython 3.10 when measured with the
1320
+ CPython 3.11 is an average of
1321
+ `25% faster <https://github.com/faster-cpython/ideas#published-results >`_
1322
+ than CPython 3.10 as measured with the
1322
1323
`pyperformance <https://github.com/python/pyperformance >`_ benchmark suite,
1323
- and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup
1324
- could be up to 10-60% faster .
1324
+ when compiled with GCC on Ubuntu Linux.
1325
+ Depending on your workload, the overall speedup could be 10-60%.
1325
1326
1326
- This project focuses on two major areas in Python: faster startup and faster
1327
- runtime. Other optimizations not under this project are listed in `Optimizations `_.
1327
+ This project focuses on two major areas in Python:
1328
+ :ref: `whatsnew311-faster-startup ` and :ref: `whatsnew311-faster-runtime `.
1329
+ Optimizations not covered by this project are listed separately under
1330
+ :ref: `whatsnew311-optimizations `.
1328
1331
1329
1332
1330
1333
.. _whatsnew311-faster-startup :
@@ -1337,8 +1340,8 @@ Faster Startup
1337
1340
Frozen imports / Static code objects
1338
1341
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1339
1342
1340
- Python caches bytecode in the :ref: `__pycache__<tut-pycache> ` directory to
1341
- speed up module loading.
1343
+ Python caches :term: ` bytecode ` in the :ref: `__pycache__ <tut-pycache >`
1344
+ directory to speed up module loading.
1342
1345
1343
1346
Previously in 3.10, Python module execution looked like this:
1344
1347
@@ -1347,8 +1350,9 @@ Previously in 3.10, Python module execution looked like this:
1347
1350
Read __pycache__ -> Unmarshal -> Heap allocated code object -> Evaluate
1348
1351
1349
1352
In Python 3.11, the core modules essential for Python startup are "frozen".
1350
- This means that their code objects (and bytecode) are statically allocated
1351
- by the interpreter. This reduces the steps in module execution process to this:
1353
+ This means that their :ref: `codeobjects ` (and bytecode)
1354
+ are statically allocated by the interpreter.
1355
+ This reduces the steps in module execution process to:
1352
1356
1353
1357
.. code-block :: text
1354
1358
@@ -1357,7 +1361,7 @@ by the interpreter. This reduces the steps in module execution process to this:
1357
1361
Interpreter startup is now 10-15% faster in Python 3.11. This has a big
1358
1362
impact for short-running programs using Python.
1359
1363
1360
- (Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.)
1364
+ (Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in many issues.)
1361
1365
1362
1366
1363
1367
.. _whatsnew311-faster-runtime :
@@ -1370,17 +1374,19 @@ Faster Runtime
1370
1374
Cheaper, lazy Python frames
1371
1375
^^^^^^^^^^^^^^^^^^^^^^^^^^^
1372
1376
1373
- Python frames are created whenever Python calls a Python function. This frame
1374
- holds execution information. The following are new frame optimizations:
1377
+ Python frames, holding execution information,
1378
+ are created whenever Python calls a Python function.
1379
+ The following are new frame optimizations:
1375
1380
1376
1381
- Streamlined the frame creation process.
1377
1382
- Avoided memory allocation by generously re-using frame space on the C stack.
1378
1383
- Streamlined the internal frame struct to contain only essential information.
1379
1384
Frames previously held extra debugging and memory management information.
1380
1385
1381
- Old-style frame objects are now created only when requested by debuggers or
1382
- by Python introspection functions such as ``sys._getframe `` or
1383
- ``inspect.currentframe ``. For most user code, no frame objects are
1386
+ Old-style :ref: `frame objects <frame-objects >`
1387
+ are now created only when requested by debuggers
1388
+ or by Python introspection functions such as :func: `sys._getframe ` and
1389
+ :func: `inspect.currentframe `. For most user code, no frame objects are
1384
1390
created at all. As a result, nearly all Python functions calls have sped
1385
1391
up significantly. We measured a 3-7% speedup in pyperformance.
1386
1392
@@ -1401,10 +1407,11 @@ In 3.11, when CPython detects Python code calling another Python function,
1401
1407
it sets up a new frame, and "jumps" to the new code inside the new frame. This
1402
1408
avoids calling the C interpreting function altogether.
1403
1409
1404
- Most Python function calls now consume no C stack space. This speeds up
1405
- most of such calls. In simple recursive functions like fibonacci or
1406
- factorial, a 1.7x speedup was observed. This also means recursive functions
1407
- can recurse significantly deeper (if the user increases the recursion limit).
1410
+ Most Python function calls now consume no C stack space, speeding them up.
1411
+ In simple recursive functions like fibonacci or
1412
+ factorial, we observed a 1.7x speedup. This also means recursive functions
1413
+ can recurse significantly deeper
1414
+ (if the user increases the recursion limit with :func: `sys.setrecursionlimit `).
1408
1415
We measured a 1-3% improvement in pyperformance.
1409
1416
1410
1417
(Contributed by Pablo Galindo and Mark Shannon in :issue: `45256 `.)
@@ -1415,7 +1422,7 @@ We measured a 1-3% improvement in pyperformance.
1415
1422
PEP 659: Specializing Adaptive Interpreter
1416
1423
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1417
1424
1418
- :pep: `659 ` is one of the key parts of the faster CPython project. The general
1425
+ :pep: `659 ` is one of the key parts of the Faster CPython project. The general
1419
1426
idea is that while Python is a dynamic language, most code has regions where
1420
1427
objects and types rarely change. This concept is known as *type stability *.
1421
1428
@@ -1424,17 +1431,18 @@ in the executing code. Python will then replace the current operation with a
1424
1431
more specialized one. This specialized operation uses fast paths available only
1425
1432
to those use cases/types, which generally outperform their generic
1426
1433
counterparts. This also brings in another concept called *inline caching *, where
1427
- Python caches the results of expensive operations directly in the bytecode.
1434
+ Python caches the results of expensive operations directly in the
1435
+ :term: `bytecode `.
1428
1436
1429
1437
The specializer will also combine certain common instruction pairs into one
1430
- superinstruction. This reduces the overhead during execution.
1438
+ superinstruction, reducing the overhead during execution.
1431
1439
1432
1440
Python will only specialize
1433
1441
when it sees code that is "hot" (executed multiple times). This prevents Python
1434
- from wasting time for run-once code. Python can also de-specialize when code is
1442
+ from wasting time on run-once code. Python can also de-specialize when code is
1435
1443
too dynamic or when the use changes. Specialization is attempted periodically,
1436
- and specialization attempts are not too expensive. This allows specialization
1437
- to adapt to new circumstances.
1444
+ and specialization attempts are not too expensive,
1445
+ allowing specialization to adapt to new circumstances.
1438
1446
1439
1447
(PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler.
1440
1448
See :pep: `659 ` for more information. Implementation by Mark Shannon and Brandt
@@ -1447,32 +1455,32 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.)
1447
1455
| Operation | Form | Specialization | Operation speedup | Contributor(s) |
1448
1456
| | | | (up to) | |
1449
1457
+===============+====================+=======================================================+===================+===================+
1450
- | Binary | ``x+x; x*x; x-x; `` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, |
1451
- | operations | | such as `` int ``, `` float ``, and `` str `` take custom | | Dong-hee Na, |
1452
- | | | fast paths for their underlying types. | | Brandt Bucher, |
1458
+ | Binary | ``x + x `` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, |
1459
+ | operations | | such as :class: ` int `, :class: ` float ` and :class: ` str ` | | Dong-hee Na, |
1460
+ | | `` x - x `` | take custom fast paths for their underlying types. | | Brandt Bucher, |
1453
1461
| | | | | Dennis Sweeney |
1462
+ | | ``x * x `` | | | |
1454
1463
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1455
- | Subscript | ``a[i] `` | Subscripting container types such as `` list ``, | 10-25% | Irit Katriel, |
1456
- | | | `` tuple `` and `` dict `` directly index the underlying | | Mark Shannon |
1457
- | | | data structures. | | |
1464
+ | Subscript | ``a[i] `` | Subscripting container types such as :class: ` list `, | 10-25% | Irit Katriel, |
1465
+ | | | :class: ` tuple ` and :class: ` dict ` directly index | | Mark Shannon |
1466
+ | | | the underlying data structures. | | |
1458
1467
| | | | | |
1459
- | | | Subscripting custom `` __getitem__ `` | | |
1468
+ | | | Subscripting custom :meth: ` ~object. __getitem__ ` | | |
1460
1469
| | | is also inlined similar to :ref: `inline-calls `. | | |
1461
1470
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1462
1471
| Store | ``a[i] = z `` | Similar to subscripting specialization above. | 10-25% | Dennis Sweeney |
1463
1472
| subscript | | | | |
1464
1473
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1465
1474
| Calls | ``f(arg) `` | Calls to common builtin (C) functions and types such | 20% | Mark Shannon, |
1466
- | | ``C(arg) `` | as ``len `` and ``str `` directly call their underlying | | Ken Jin |
1467
- | | | C version. This avoids going through the internal | | |
1468
- | | | calling convention. | | |
1469
- | | | | | |
1475
+ | | | as :func: `len ` and :class: `str ` directly call their | | Ken Jin |
1476
+ | | ``C(arg) `` | underlying C version. This avoids going through the | | |
1477
+ | | | internal calling convention. | | |
1470
1478
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1471
- | Load | ``print `` | The object's index in the globals/builtins namespace | [1 ]_ | Mark Shannon |
1472
- | global | `` len `` | is cached. Loading globals and builtins require | | |
1473
- | variable | | zero namespace lookups. | | |
1479
+ | Load | ``print `` | The object's index in the globals/builtins namespace | [#load-global ]_ | Mark Shannon |
1480
+ | global | | is cached. Loading globals and builtins require | | |
1481
+ | variable | `` len `` | zero namespace lookups. | | |
1474
1482
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1475
- | Load | ``o.attr `` | Similar to loading global variables. The attribute's | [2 ]_ | Mark Shannon |
1483
+ | Load | ``o.attr `` | Similar to loading global variables. The attribute's | [#load-attr ]_ | Mark Shannon |
1476
1484
| attribute | | index inside the class/object's namespace is cached. | | |
1477
1485
| | | In most cases, attribute loading will require zero | | |
1478
1486
| | | namespace lookups. | | |
@@ -1484,14 +1492,15 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.)
1484
1492
| Store | ``o.attr = z `` | Similar to load attribute optimization. | 2% | Mark Shannon |
1485
1493
| attribute | | | in pyperformance | |
1486
1494
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1487
- | Unpack | ``*seq `` | Specialized for common containers such as ``list `` | 8% | Brandt Bucher |
1488
- | Sequence | | and ``tuple ``. Avoids internal calling convention. | | |
1495
+ | Unpack | ``*seq `` | Specialized for common containers such as | 8% | Brandt Bucher |
1496
+ | Sequence | | :class: `list ` and :class: `tuple `. | | |
1497
+ | | | Avoids internal calling convention. | | |
1489
1498
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1490
1499
1491
- .. [1 ] A similar optimization already existed since Python 3.8. 3.11
1492
- specializes for more forms and reduces some overhead.
1500
+ .. [#load-global ] A similar optimization already existed since Python 3.8.
1501
+ 3.11 specializes for more forms and reduces some overhead.
1493
1502
1494
- .. [2 ] A similar optimization already existed since Python 3.10.
1503
+ .. [#load-attr ] A similar optimization already existed since Python 3.10.
1495
1504
3.11 specializes for more forms. Furthermore, all attribute loads should
1496
1505
be sped up by :issue: `45947 `.
1497
1506
@@ -1501,49 +1510,72 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.)
1501
1510
Misc
1502
1511
----
1503
1512
1504
- * Objects now require less memory due to lazily created object namespaces. Their
1505
- namespace dictionaries now also share keys more freely.
1513
+ * Objects now require less memory due to lazily created object namespaces.
1514
+ Their namespace dictionaries now also share keys more freely.
1506
1515
(Contributed Mark Shannon in :issue: `45340 ` and :issue: `40116 `.)
1507
1516
1517
+ * "Zero-cost" exceptions are implemented, eliminating the cost
1518
+ of :keyword: `try ` statements when no exception is raised.
1519
+ (Contributed by Mark Shannon in :issue: `40222 `.)
1520
+
1508
1521
* A more concise representation of exceptions in the interpreter reduced the
1509
1522
time required for catching an exception by about 10%.
1510
1523
(Contributed by Irit Katriel in :issue: `45711 `.)
1511
1524
1525
+ * :mod: `re `'s regular expression matching engine has been partially refactored,
1526
+ and now uses computed gotos (or "threaded code") on supported platforms. As a
1527
+ result, Python 3.11 executes the `pyperformance regular expression benchmarks
1528
+ <https://pyperformance.readthedocs.io/benchmarks.html#regex-dna> `_ up to 10%
1529
+ faster than Python 3.10.
1530
+ (Contributed by Brandt Bucher in :gh: `91404 `.)
1531
+
1512
1532
1513
1533
.. _whatsnew311-faster-cpython-faq :
1514
1534
1515
1535
FAQ
1516
1536
---
1517
1537
1518
- | Q: How should I write my code to utilize these speedups?
1519
- |
1520
- | A: You don't have to change your code. Write Pythonic code that follows common
1521
- best practices. The Faster CPython project optimizes for common code
1522
- patterns we observe.
1523
- |
1524
- |
1525
- | Q: Will CPython 3.11 use more memory?
1526
- |
1527
- | A: Maybe not. We don't expect memory use to exceed 20% more than 3.10.
1528
- This is offset by memory optimizations for frame objects and object
1529
- dictionaries as mentioned above.
1530
- |
1531
- |
1532
- | Q: I don't see any speedups in my workload. Why?
1533
- |
1534
- | A: Certain code won't have noticeable benefits. If your code spends most of
1535
- its time on I/O operations, or already does most of its
1536
- computation in a C extension library like numpy, there won't be significant
1537
- speedup. This project currently benefits pure-Python workloads the most.
1538
- |
1539
- | Furthermore, the pyperformance figures are a geometric mean. Even within the
1540
- pyperformance benchmarks, certain benchmarks have slowed down slightly, while
1541
- others have sped up by nearly 2x!
1542
- |
1543
- |
1544
- | Q: Is there a JIT compiler?
1545
- |
1546
- | A: No. We're still exploring other optimizations.
1538
+ .. _faster-cpython-faq-my-code :
1539
+
1540
+ How should I write my code to utilize these speedups?
1541
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1542
+
1543
+ Write Pythonic code that follows common best practices;
1544
+ you don't have to change your code.
1545
+ The Faster CPython project optimizes for common code patterns we observe.
1546
+
1547
+
1548
+ .. _faster-cpython-faq-memory :
1549
+
1550
+ Will CPython 3.11 use more memory?
1551
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1552
+
1553
+ Maybe not; we don't expect memory use to exceed 20% higher than 3.10.
1554
+ This is offset by memory optimizations for frame objects and object
1555
+ dictionaries as mentioned above.
1556
+
1557
+
1558
+ .. _faster-cpython-ymmv :
1559
+
1560
+ I don't see any speedups in my workload. Why?
1561
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1562
+
1563
+ Certain code won't have noticeable benefits. If your code spends most of
1564
+ its time on I/O operations, or already does most of its
1565
+ computation in a C extension library like NumPy, there won't be significant
1566
+ speedups. This project currently benefits pure-Python workloads the most.
1567
+
1568
+ Furthermore, the pyperformance figures are a geometric mean. Even within the
1569
+ pyperformance benchmarks, certain benchmarks have slowed down slightly, while
1570
+ others have sped up by nearly 2x!
1571
+
1572
+
1573
+ .. _faster-cpython-jit :
1574
+
1575
+ Is there a JIT compiler?
1576
+ ^^^^^^^^^^^^^^^^^^^^^^^^
1577
+
1578
+ No. We're still exploring other optimizations.
1547
1579
1548
1580
1549
1581
.. _whatsnew311-faster-cpython-about :
0 commit comments