Skip to content

bpo-45636: Remove the old %-formatting fast-path #29532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 15, 2021

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Nov 12, 2021

No perf impact:

Slower (26):
- regex_effbot: 3.17 ms +- 0.06 ms -> 3.33 ms +- 0.04 ms: 1.05x slower
- unpickle_list: 5.04 us +- 0.10 us -> 5.25 us +- 0.10 us: 1.04x slower
- logging_silent: 114 ns +- 3 ns -> 118 ns +- 3 ns: 1.04x slower
- fannkuch: 402 ms +- 5 ms -> 415 ms +- 4 ms: 1.03x slower
- regex_dna: 217 ms +- 1 ms -> 224 ms +- 1 ms: 1.03x slower
- unpickle: 14.3 us +- 0.4 us -> 14.7 us +- 1.1 us: 1.03x slower
- mako: 11.8 ms +- 0.1 ms -> 12.2 ms +- 0.1 ms: 1.03x slower
- regex_v8: 23.4 ms +- 0.2 ms -> 23.9 ms +- 0.2 ms: 1.02x slower
- scimark_sparse_mat_mult: 4.48 ms +- 0.09 ms -> 4.57 ms +- 0.14 ms: 1.02x slower
- hexiom: 7.28 ms +- 0.03 ms -> 7.42 ms +- 0.07 ms: 1.02x slower
- crypto_pyaes: 88.8 ms +- 0.7 ms -> 90.2 ms +- 0.8 ms: 1.02x slower
- nbody: 108 ms +- 2 ms -> 109 ms +- 1 ms: 1.01x slower
- django_template: 37.3 ms +- 1.0 ms -> 37.9 ms +- 0.5 ms: 1.01x slower
- pickle_pure_python: 360 us +- 2 us -> 366 us +- 4 us: 1.01x slower
- sympy_str: 300 ms +- 2 ms -> 304 ms +- 6 ms: 1.01x slower
- pyflate: 523 ms +- 4 ms -> 528 ms +- 4 ms: 1.01x slower
- sympy_sum: 169 ms +- 2 ms -> 170 ms +- 2 ms: 1.01x slower
- unpickle_pure_python: 266 us +- 4 us -> 268 us +- 2 us: 1.01x slower
- sympy_expand: 502 ms +- 8 ms -> 506 ms +- 6 ms: 1.01x slower
- sympy_integrate: 21.9 ms +- 0.1 ms -> 22.1 ms +- 0.1 ms: 1.01x slower
- raytrace: 325 ms +- 2 ms -> 327 ms +- 2 ms: 1.01x slower
- regex_compile: 142 ms +- 2 ms -> 143 ms +- 1 ms: 1.01x slower
- 2to3: 270 ms +- 1 ms -> 272 ms +- 1 ms: 1.01x slower
- xml_etree_generate: 79.4 ms +- 0.4 ms -> 79.8 ms +- 0.6 ms: 1.01x slower
- python_startup: 7.61 ms +- 0.01 ms -> 7.64 ms +- 0.01 ms: 1.00x slower
- python_startup_no_site: 5.59 ms +- 0.00 ms -> 5.62 ms +- 0.00 ms: 1.00x slower

Faster (16):
- pidigits: 207 ms +- 0 ms -> 195 ms +- 0 ms: 1.06x faster
- pickle_list: 4.53 us +- 0.08 us -> 4.40 us +- 0.07 us: 1.03x faster
- pickle_dict: 28.4 us +- 0.9 us -> 27.8 us +- 0.1 us: 1.02x faster
- telco: 6.61 ms +- 0.22 ms -> 6.48 ms +- 0.19 ms: 1.02x faster
- scimark_lu: 139 ms +- 4 ms -> 136 ms +- 4 ms: 1.02x faster
- richards: 56.2 ms +- 1.1 ms -> 55.5 ms +- 0.9 ms: 1.01x faster
- scimark_fft: 341 ms +- 4 ms -> 338 ms +- 4 ms: 1.01x faster
- pickle: 9.97 us +- 0.09 us -> 9.87 us +- 0.09 us: 1.01x faster
- unpack_sequence: 43.9 ns +- 0.8 ns -> 43.5 ns +- 0.4 ns: 1.01x faster
- go: 167 ms +- 2 ms -> 165 ms +- 2 ms: 1.01x faster
- chameleon: 7.53 ms +- 0.07 ms -> 7.47 ms +- 0.10 ms: 1.01x faster
- logging_simple: 6.08 us +- 0.08 us -> 6.03 us +- 0.09 us: 1.01x faster
- spectral_norm: 104 ms +- 2 ms -> 103 ms +- 1 ms: 1.01x faster
- nqueens: 90.5 ms +- 0.8 ms -> 89.9 ms +- 0.9 ms: 1.01x faster
- deltablue: 4.83 ms +- 0.05 ms -> 4.81 ms +- 0.05 ms: 1.01x faster
- json_dumps: 12.4 ms +- 0.1 ms -> 12.3 ms +- 0.1 ms: 1.00x faster

Benchmark hidden because not significant (16): chaos, dulwich_log, float, json_loads, logging_format, meteor_contest, pathlib, scimark_monte_carlo, scimark_sor, sqlalchemy_declarative, sqlalchemy_imperative, sqlite_synth, tornado_http, xml_etree_parse, xml_etree_iterparse, xml_etree_process

Geometric mean: 1.00x slower

https://bugs.python.org/issue45636

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@markshannon
Copy link
Member

markshannon commented Nov 12, 2021

Do you have any evidence that adding BINARY_OP_REMAINDER_UNICODE is worthwhile?
The overhead of PyUnicode_Format will dwarf any dispatch overhead in PyNumber_Remainder.

As a general comment, avoid && and || in DEOPT_IF. Each DEOPT_IF should ideally compile to a single test-and-branch.

@markshannon
Copy link
Member

Just be clear.
I approve of removing the special case code in BINARY_OP.
I don't think it is worth adding the specialized instruction.

@brandtbucher
Copy link
Member Author

Do you have any evidence that adding BINARY_OP_REMAINDER_UNICODE is worthwhile?
The overhead of PyUnicode_Format will dwarf any dispatch overhead in PyNumber_Remainder.

Yep, your intuition seems correct here. Removing both the specialization and the fast path has no impact on pyperformance.

I was curious, so I also ran a few micro-benchmarks on PGO/LTO builds. This benchmark tests basically the simplest possible case of string formatting ("%s" % ""), and probably represents the upper bound of what impact we could possibly see from this optimization:

$ pyperf timeit --rigorous --quiet -s 'l, r = "%s", ""' 'for _ in range(10_000_000): l % r'

  • Fast path: Mean +- std dev: 260 ms +- 25 ms
  • Specialized instruction: Mean +- std dev: 278 ms +- 26 ms
  • Neither: Mean +- std dev: 270 ms +- 39 ms

It looks like optimizing this has negligible improvement (if any). I'm comfortable removing it, especially since more generic operator implementation caching will likely benefit this case anyways.

@@ -1380,13 +1380,13 @@ _Py_Specialize_BinaryOp(PyObject *lhs, PyObject *rhs, _Py_CODEUNIT *instr,
SpecializedCacheEntry *cache)
{
_PyAdaptiveEntry *adaptive = &cache->adaptive;
if (!Py_IS_TYPE(lhs, Py_TYPE(rhs))) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep this change, though (it allows us to "un-adapt" more instructions in the default case).

@brandtbucher brandtbucher changed the title bpo-45636: Convert the string formatting fast-path into a specialized instruction bpo-45636: Remove the old %-formatting optimization Nov 12, 2021
@brandtbucher brandtbucher changed the title bpo-45636: Remove the old %-formatting optimization bpo-45636: Remove the old %-formatting fast-path Nov 12, 2021
@gvanrossum
Copy link
Member

FWIW, nearly 13 years ago this showed a modest improvement. I'm not surprised it no longer does, I just got curious why we had this optimization in the first place: https://bugs.python.org/issue5176 8725dce

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Nov 13, 2021

FWIW, nearly 13 years ago this showed a modest improvement. I'm not surprised it no longer does, I just got curious why we had this optimization in the first place: https://bugs.python.org/issue5176 8725dce

Maybe because of Serhiy's work in https://bugs.python.org/issue28307. That should cover most common use cases.
See for e.g dis.dis('"%s = %r" % (k, v)')

@gvanrossum
Copy link
Member

Oh, but that means that Brandt's micro-benchmark doesn't show the effect.

@brandtbucher brandtbucher merged commit ec382fa into python:main Nov 15, 2021
@brandtbucher brandtbucher deleted the unicode-formatting branch November 15, 2021 16:58
@gvanrossum
Copy link
Member

Did you try at least a micro-benchmark that doesn't generate code from the format string? (Not that it would matter, it's a rare use case, but it would seem useful to at least know.)

@brandtbucher
Copy link
Member Author

brandtbucher commented Nov 15, 2021

Did you try at least a micro-benchmark that doesn't generate code from the format string? (Not that it would matter, it's a rare use case, but it would seem useful to at least know.)

My micro-benchmark doesn't hit Serhiy's AST optimization. That only kicks in when the LHS is a string literal and the RHS is a tuple display, I think:

>>> dis.dis('l, r = "%s", ""; l % r')
  1           0 LOAD_CONST               0 (('%s', ''))
              2 UNPACK_SEQUENCE          2
              4 STORE_NAME               0 (l)
              6 STORE_NAME               1 (r)
              8 LOAD_NAME                0 (l)
             10 LOAD_NAME                1 (r)
             12 BINARY_OP                6 (%)
             14 POP_TOP
             16 LOAD_CONST               1 (None)
             18 RETURN_VALUE

remykarem pushed a commit to remykarem/cpython that referenced this pull request Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants