bpo-45636: Remove the old `%`-formatting fast-path #29532

brandtbucher · 2021-11-12T04:02:42Z

No perf impact:

Slower (26):
- regex_effbot: 3.17 ms +- 0.06 ms -> 3.33 ms +- 0.04 ms: 1.05x slower
- unpickle_list: 5.04 us +- 0.10 us -> 5.25 us +- 0.10 us: 1.04x slower
- logging_silent: 114 ns +- 3 ns -> 118 ns +- 3 ns: 1.04x slower
- fannkuch: 402 ms +- 5 ms -> 415 ms +- 4 ms: 1.03x slower
- regex_dna: 217 ms +- 1 ms -> 224 ms +- 1 ms: 1.03x slower
- unpickle: 14.3 us +- 0.4 us -> 14.7 us +- 1.1 us: 1.03x slower
- mako: 11.8 ms +- 0.1 ms -> 12.2 ms +- 0.1 ms: 1.03x slower
- regex_v8: 23.4 ms +- 0.2 ms -> 23.9 ms +- 0.2 ms: 1.02x slower
- scimark_sparse_mat_mult: 4.48 ms +- 0.09 ms -> 4.57 ms +- 0.14 ms: 1.02x slower
- hexiom: 7.28 ms +- 0.03 ms -> 7.42 ms +- 0.07 ms: 1.02x slower
- crypto_pyaes: 88.8 ms +- 0.7 ms -> 90.2 ms +- 0.8 ms: 1.02x slower
- nbody: 108 ms +- 2 ms -> 109 ms +- 1 ms: 1.01x slower
- django_template: 37.3 ms +- 1.0 ms -> 37.9 ms +- 0.5 ms: 1.01x slower
- pickle_pure_python: 360 us +- 2 us -> 366 us +- 4 us: 1.01x slower
- sympy_str: 300 ms +- 2 ms -> 304 ms +- 6 ms: 1.01x slower
- pyflate: 523 ms +- 4 ms -> 528 ms +- 4 ms: 1.01x slower
- sympy_sum: 169 ms +- 2 ms -> 170 ms +- 2 ms: 1.01x slower
- unpickle_pure_python: 266 us +- 4 us -> 268 us +- 2 us: 1.01x slower
- sympy_expand: 502 ms +- 8 ms -> 506 ms +- 6 ms: 1.01x slower
- sympy_integrate: 21.9 ms +- 0.1 ms -> 22.1 ms +- 0.1 ms: 1.01x slower
- raytrace: 325 ms +- 2 ms -> 327 ms +- 2 ms: 1.01x slower
- regex_compile: 142 ms +- 2 ms -> 143 ms +- 1 ms: 1.01x slower
- 2to3: 270 ms +- 1 ms -> 272 ms +- 1 ms: 1.01x slower
- xml_etree_generate: 79.4 ms +- 0.4 ms -> 79.8 ms +- 0.6 ms: 1.01x slower
- python_startup: 7.61 ms +- 0.01 ms -> 7.64 ms +- 0.01 ms: 1.00x slower
- python_startup_no_site: 5.59 ms +- 0.00 ms -> 5.62 ms +- 0.00 ms: 1.00x slower

Faster (16):
- pidigits: 207 ms +- 0 ms -> 195 ms +- 0 ms: 1.06x faster
- pickle_list: 4.53 us +- 0.08 us -> 4.40 us +- 0.07 us: 1.03x faster
- pickle_dict: 28.4 us +- 0.9 us -> 27.8 us +- 0.1 us: 1.02x faster
- telco: 6.61 ms +- 0.22 ms -> 6.48 ms +- 0.19 ms: 1.02x faster
- scimark_lu: 139 ms +- 4 ms -> 136 ms +- 4 ms: 1.02x faster
- richards: 56.2 ms +- 1.1 ms -> 55.5 ms +- 0.9 ms: 1.01x faster
- scimark_fft: 341 ms +- 4 ms -> 338 ms +- 4 ms: 1.01x faster
- pickle: 9.97 us +- 0.09 us -> 9.87 us +- 0.09 us: 1.01x faster
- unpack_sequence: 43.9 ns +- 0.8 ns -> 43.5 ns +- 0.4 ns: 1.01x faster
- go: 167 ms +- 2 ms -> 165 ms +- 2 ms: 1.01x faster
- chameleon: 7.53 ms +- 0.07 ms -> 7.47 ms +- 0.10 ms: 1.01x faster
- logging_simple: 6.08 us +- 0.08 us -> 6.03 us +- 0.09 us: 1.01x faster
- spectral_norm: 104 ms +- 2 ms -> 103 ms +- 1 ms: 1.01x faster
- nqueens: 90.5 ms +- 0.8 ms -> 89.9 ms +- 0.9 ms: 1.01x faster
- deltablue: 4.83 ms +- 0.05 ms -> 4.81 ms +- 0.05 ms: 1.01x faster
- json_dumps: 12.4 ms +- 0.1 ms -> 12.3 ms +- 0.1 ms: 1.00x faster

Benchmark hidden because not significant (16): chaos, dulwich_log, float, json_loads, logging_format, meteor_contest, pathlib, scimark_monte_carlo, scimark_sor, sqlalchemy_declarative, sqlalchemy_imperative, sqlite_synth, tornado_http, xml_etree_parse, xml_etree_iterparse, xml_etree_process

Geometric mean: 1.00x slower

https://bugs.python.org/issue45636

Fidget-Spinner

LGTM.

markshannon · 2021-11-12T11:42:56Z

Do you have any evidence that adding BINARY_OP_REMAINDER_UNICODE is worthwhile?
The overhead of PyUnicode_Format will dwarf any dispatch overhead in PyNumber_Remainder.

As a general comment, avoid && and || in DEOPT_IF. Each DEOPT_IF should ideally compile to a single test-and-branch.

markshannon · 2021-11-12T11:45:50Z

Just be clear.
I approve of removing the special case code in BINARY_OP.
I don't think it is worth adding the specialized instruction.

brandtbucher · 2021-11-12T21:57:20Z

Do you have any evidence that adding BINARY_OP_REMAINDER_UNICODE is worthwhile?
The overhead of PyUnicode_Format will dwarf any dispatch overhead in PyNumber_Remainder.

Yep, your intuition seems correct here. Removing both the specialization and the fast path has no impact on pyperformance.

I was curious, so I also ran a few micro-benchmarks on PGO/LTO builds. This benchmark tests basically the simplest possible case of string formatting ("%s" % ""), and probably represents the upper bound of what impact we could possibly see from this optimization:

$ pyperf timeit --rigorous --quiet -s 'l, r = "%s", ""' 'for _ in range(10_000_000): l % r'

Fast path: Mean +- std dev: 260 ms +- 25 ms
Specialized instruction: Mean +- std dev: 278 ms +- 26 ms
Neither: Mean +- std dev: 270 ms +- 39 ms

It looks like optimizing this has negligible improvement (if any). I'm comfortable removing it, especially since more generic operator implementation caching will likely benefit this case anyways.

brandtbucher · 2021-11-12T21:59:40Z

Python/specialize.c

@@ -1380,13 +1380,13 @@ _Py_Specialize_BinaryOp(PyObject *lhs, PyObject *rhs, _Py_CODEUNIT *instr,
                        SpecializedCacheEntry *cache)
 {
    _PyAdaptiveEntry *adaptive = &cache->adaptive;
-    if (!Py_IS_TYPE(lhs, Py_TYPE(rhs))) {


I want to keep this change, though (it allows us to "un-adapt" more instructions in the default case).

gvanrossum · 2021-11-13T02:20:05Z

FWIW, nearly 13 years ago this showed a modest improvement. I'm not surprised it no longer does, I just got curious why we had this optimization in the first place: https://bugs.python.org/issue5176 8725dce

Fidget-Spinner · 2021-11-13T02:35:02Z

FWIW, nearly 13 years ago this showed a modest improvement. I'm not surprised it no longer does, I just got curious why we had this optimization in the first place: https://bugs.python.org/issue5176 8725dce

Maybe because of Serhiy's work in https://bugs.python.org/issue28307. That should cover most common use cases.
See for e.g dis.dis('"%s = %r" % (k, v)')

gvanrossum · 2021-11-13T04:51:36Z

Oh, but that means that Brandt's micro-benchmark doesn't show the effect.

gvanrossum · 2021-11-15T17:00:40Z

Did you try at least a micro-benchmark that doesn't generate code from the format string? (Not that it would matter, it's a rare use case, but it would seem useful to at least know.)

brandtbucher · 2021-11-15T18:31:54Z

Did you try at least a micro-benchmark that doesn't generate code from the format string? (Not that it would matter, it's a rare use case, but it would seem useful to at least know.)

My micro-benchmark doesn't hit Serhiy's AST optimization. That only kicks in when the LHS is a string literal and the RHS is a tuple display, I think:

>>> dis.dis('l, r = "%s", ""; l % r')
  1           0 LOAD_CONST               0 (('%s', ''))
              2 UNPACK_SEQUENCE          2
              4 STORE_NAME               0 (l)
              6 STORE_NAME               1 (r)
              8 LOAD_NAME                0 (l)
             10 LOAD_NAME                1 (r)
             12 BINARY_OP                6 (%)
             14 POP_TOP
             16 LOAD_CONST               1 (None)
             18 RETURN_VALUE

brandtbucher added 3 commits November 11, 2021 11:42

Specialize unicode formatting

21eb40c

Catch up with main

fe18bf1

blurb add

1d57dba

brandtbucher requested a review from markshannon as a code owner November 12, 2021 04:02

the-knights-who-say-ni added the CLA signed label Nov 12, 2021

bedevere-bot added the awaiting core review label Nov 12, 2021

Fidget-Spinner approved these changes Nov 12, 2021

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Nov 12, 2021

Remove BINARY_OP_REMAINDER_UNICODE

6ffcf47

brandtbucher commented Nov 12, 2021

View reviewed changes

brandtbucher changed the title ~~bpo-45636: Convert the string formatting fast-path into a specialized instruction~~ bpo-45636: Remove the old %-formatting optimization Nov 12, 2021

Update NEWS

b37bf5a

brandtbucher changed the title ~~bpo-45636: Remove the old %-formatting optimization~~ bpo-45636: Remove the old %-formatting fast-path Nov 12, 2021

brandtbucher merged commit ec382fa into python:main Nov 15, 2021

bedevere-bot removed the awaiting merge label Nov 15, 2021

brandtbucher deleted the unicode-formatting branch November 15, 2021 16:58

remykarem pushed a commit to remykarem/cpython that referenced this pull request Dec 7, 2021

bpo-45636: Remove the old %-formatting fast-path (pythonGH-29532)

1b00b4f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-45636: Remove the old `%`-formatting fast-path #29532

bpo-45636: Remove the old `%`-formatting fast-path #29532

brandtbucher commented Nov 12, 2021 •

edited

Loading

Fidget-Spinner left a comment

markshannon commented Nov 12, 2021 •

edited

Loading

markshannon commented Nov 12, 2021

brandtbucher commented Nov 12, 2021

brandtbucher Nov 12, 2021

gvanrossum commented Nov 13, 2021

Fidget-Spinner commented Nov 13, 2021 •

edited

Loading

gvanrossum commented Nov 13, 2021

gvanrossum commented Nov 15, 2021

brandtbucher commented Nov 15, 2021 •

edited

Loading

bpo-45636: Remove the old %-formatting fast-path #29532

bpo-45636: Remove the old %-formatting fast-path #29532

Conversation

brandtbucher commented Nov 12, 2021 • edited Loading

Fidget-Spinner left a comment

Choose a reason for hiding this comment

markshannon commented Nov 12, 2021 • edited Loading

markshannon commented Nov 12, 2021

brandtbucher commented Nov 12, 2021

brandtbucher Nov 12, 2021

Choose a reason for hiding this comment

gvanrossum commented Nov 13, 2021

Fidget-Spinner commented Nov 13, 2021 • edited Loading

gvanrossum commented Nov 13, 2021

gvanrossum commented Nov 15, 2021

brandtbucher commented Nov 15, 2021 • edited Loading

bpo-45636: Remove the old `%`-formatting fast-path #29532

bpo-45636: Remove the old `%`-formatting fast-path #29532

brandtbucher commented Nov 12, 2021 •

edited

Loading

markshannon commented Nov 12, 2021 •

edited

Loading

Fidget-Spinner commented Nov 13, 2021 •

edited

Loading

brandtbucher commented Nov 15, 2021 •

edited

Loading