Skip to content

gh-90667: Add specializations of Py_DECREF when types are known #30872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Apr 19, 2022

Conversation

sweeneyde
Copy link
Member

@sweeneyde sweeneyde commented Jan 25, 2022

@sweeneyde sweeneyde marked this pull request as ready for review January 25, 2022 05:40
@sweeneyde
Copy link
Member Author

sweeneyde commented Jan 25, 2022

I tried to only modify hot-ish (not error path) code where DECREF is likely to take the deallocation branch

@sweeneyde
Copy link
Member Author

Slower (7):

  • json_dumps: 12.8 ms +- 0.2 ms -> 13.1 ms +- 0.3 ms: 1.02x slower
  • meteor_contest: 106 ms +- 2 ms -> 109 ms +- 3 ms: 1.02x slower
  • telco: 6.50 ms +- 0.17 ms -> 6.58 ms +- 0.18 ms: 1.01x slower
  • fannkuch: 383 ms +- 5 ms -> 388 ms +- 8 ms: 1.01x slower
  • regex_compile: 143 ms +- 2 ms -> 145 ms +- 4 ms: 1.01x slower
  • mako: 11.1 ms +- 0.1 ms -> 11.2 ms +- 0.2 ms: 1.01x slower
  • chameleon: 7.08 ms +- 0.07 ms -> 7.12 ms +- 0.10 ms: 1.01x slower

Faster (27):

  • unpack_sequence: 45.9 ns +- 1.1 ns -> 41.6 ns +- 1.0 ns: 1.10x faster
  • logging_silent: 108 ns +- 11 ns -> 101 ns +- 3 ns: 1.06x faster
  • nbody: 95.6 ms +- 3.2 ms -> 90.2 ms +- 1.9 ms: 1.06x faster
  • spectral_norm: 98.3 ms +- 2.3 ms -> 92.8 ms +- 1.6 ms: 1.06x faster
  • regex_dna: 202 ms +- 3 ms -> 194 ms +- 3 ms: 1.04x faster
  • scimark_fft: 342 ms +- 12 ms -> 331 ms +- 7 ms: 1.03x faster
  • crypto_pyaes: 89.6 ms +- 1.7 ms -> 86.8 ms +- 1.1 ms: 1.03x faster
  • json_loads: 27.4 us +- 0.9 us -> 26.5 us +- 1.3 us: 1.03x faster
  • scimark_monte_carlo: 69.3 ms +- 1.5 ms -> 67.3 ms +- 1.2 ms: 1.03x faster
  • pickle_list: 4.62 us +- 0.21 us -> 4.51 us +- 0.15 us: 1.02x faster
  • scimark_sparse_mat_mult: 5.14 ms +- 0.21 ms -> 5.02 ms +- 0.18 ms: 1.02x faster
  • xml_etree_parse: 161 ms +- 5 ms -> 157 ms +- 6 ms: 1.02x faster
  • regex_effbot: 3.07 ms +- 0.05 ms -> 3.00 ms +- 0.07 ms: 1.02x faster
  • deltablue: 4.36 ms +- 0.14 ms -> 4.27 ms +- 0.14 ms: 1.02x faster
  • pickle_pure_python: 343 us +- 6 us -> 335 us +- 8 us: 1.02x faster
  • sqlite_synth: 2.60 us +- 0.06 us -> 2.55 us +- 0.04 us: 1.02x faster
  • xml_etree_iterparse: 110 ms +- 2 ms -> 108 ms +- 2 ms: 1.02x faster
  • go: 146 ms +- 2 ms -> 143 ms +- 3 ms: 1.02x faster
  • pathlib: 20.2 ms +- 0.5 ms -> 19.8 ms +- 0.3 ms: 1.02x faster
  • scimark_sor: 117 ms +- 3 ms -> 115 ms +- 2 ms: 1.02x faster
  • dulwich_log: 80.9 ms +- 2.0 ms -> 79.6 ms +- 1.7 ms: 1.02x faster
  • nqueens: 84.4 ms +- 1.7 ms -> 83.1 ms +- 2.0 ms: 1.02x faster
  • python_startup: 8.84 ms +- 0.07 ms -> 8.76 ms +- 0.07 ms: 1.01x faster
  • 2to3: 269 ms +- 4 ms -> 266 ms +- 3 ms: 1.01x faster
  • float: 77.0 ms +- 1.2 ms -> 76.5 ms +- 1.5 ms: 1.01x faster
  • sympy_integrate: 22.7 ms +- 0.3 ms -> 22.5 ms +- 0.2 ms: 1.01x faster
  • xml_etree_process: 55.7 ms +- 0.7 ms -> 55.4 ms +- 0.6 ms: 1.01x faster

Benchmark hidden because not significant (24): chaos, django_template, hexiom, logging_format, logging_simple, pickle, pickle_dict, pidigits, pyflate, python_startup_no_site, raytrace, regex_v8, richards, scimark_lu, sqlalchemy_declarative, sqlalchemy_imperative, sympy_expand, sympy_sum, sympy_str, tornado_http, unpickle, unpickle_list, unpickle_pure_python, xml_etree_generate

Geometric mean: 1.01x faster

Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. I read through most of the math and ceval module changes to verify the types are right.

@sweeneyde

This comment has been minimized.

@sweeneyde

This comment has been minimized.

@markshannon
Copy link
Member

Thanks. Another nice little speedup.

The specialized for of Py_DECREF for bool and None are basically the same. Maybe replace them with Py_DECREF_IMMORTAL?

For the other classes, rather than the addition level of macros, I think it would be more readable to use _Py_DECREF_SPECIALIZED directly. Each extra macro means more jumping around the code when reading it.

#ifdef Py_DEBUG
if (op->ob_refcnt <= 0) {
// Calls _Py_FatalRefcountError for None, True, and False
_Py_Dealloc(op);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this to call _Py_FatalRefcountError directly?
We might want to use _Py_DECREF_IMMORTAL for other "immortal" objects like 0 or () and there is no guarantee that their de-allocation functions will call _Py_FatalRefcountError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I moved _Py_FatalRefcountError from pycore_pyerrors to pycore_object as well, rather than having an #include-dependence between the pycore_files.

@sweeneyde sweeneyde changed the title bpo-46509: Add specializations of Py_DECREF when types are known gh-90667: Add specializations of Py_DECREF when types are known Apr 12, 2022
@sweeneyde sweeneyde added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 12, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @sweeneyde for commit ad146ef 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 12, 2022
@sweeneyde
Copy link
Member Author

@markshannon would you mind reviewing this again? Thanks

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just benchmarked this and I too see a 1% speedup.

I'm a bit wary of making of making this change, because it feels like we're adding more code for what should be data.
IMO, we should be declaring the shape of objects, not adding more special case code.
But that will have to wait until 3.12.

For 3.11, pragmatism beats purity and we'll take the speedup.

const char *func,
const char *message);

#define _Py_FatalRefcountError(message) _Py_FatalRefcountErrorFunc(__func__, message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOI, why move this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid a dependence between the pycore_... includes. The unicodeobject.c and tupleobject.c implementations both only used pycore_pyerrors for _Py_FatalRefcountError(), which doesn't really concern itself with the implementation details of exceptions.

@@ -36,7 +36,15 @@ medium_value(PyLongObject *x)
#define IS_SMALL_INT(ival) (-_PY_NSMALLNEGINTS <= (ival) && (ival) < _PY_NSMALLPOSINTS)
#define IS_SMALL_UINT(ival) ((ival) < _PY_NSMALLPOSINTS)

static inline int is_medium_int(stwodigits x)
static inline void
_Py_DECREF_INT(PyLongObject *op)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this ultimately equivalent to

#define _Py_DECREF_INT(op) _Py_DECREF_SPECIALIZED(op, PyObject_Free)

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed _PyLongExactDealloc, but I'd like to keep the assertion around to make sure we're never accidentally calling this on int subclasses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants