Skip to content

Optimise the way tracemalloc and PyRefTracer hooks work #125790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pablogsal opened this issue Oct 21, 2024 · 2 comments
Open

Optimise the way tracemalloc and PyRefTracer hooks work #125790

pablogsal opened this issue Oct 21, 2024 · 2 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-feature A feature request or enhancement

Comments

@pablogsal
Copy link
Member

In #125703 @markshannon has raised that he is unhappy about the performance implications of where these hooks are placed and in a call we discussed that he has some ideas on how to make them more performant by moving them elsewhere or adapting then.

I am opening this issue to track and sync about these improvements for 3.14 and beyond.

@markshannon
Copy link
Member

I think we can fix the performance issues by raising the level at which allocation/free goes through a function pointer.

Instead of a malloc-like interface void *malloc(size_t size), we should be returning partially initialized objects.
PyObject *obj_malloc(PyTypeObject *tp, size_t size, size_t presize) would allocate a chunk of memory size + presize, returning a PyObject * pointing to that memory + presize, with the ob_type field set to tp and the ob_refcount set to one.

This is low-enough level to be fully general, but with enough context to support tracemalloc.

I think we would need the following implementations, switchable at runtime:

  • default
  • tracemalloc
  • custom (used when the underlying PEP 445 allocator is changed)
  • custom-tracemalloc (used when the underlying PEP 445 allocator is changed)
  • freethreading
  • freethreading-tracemalloc

We don't need (or want) to switch between the free-threading and default allocators, but it keeps the rest of the code simpler if they have the same interface.

@markshannon
Copy link
Member

Having played around with this a bit, I'm going with a slightly different design.

All allocations and deallocations will go through a single (inline) function taking the threadstate as well as the type and sizes.
This will allocate from per-size freelists, then fall back to the basic malloc/free functions.
This works as we can disable, or re-enable, the freelists when PyRefTracer_SetTracer or PyMem_SetAllocator are called.

@picnixz picnixz added type-feature A feature request or enhancement interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants