Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-115103: Implement delayed free mechanism for free-threaded builds #115367

Merged
merged 2 commits into from
Feb 20, 2024

Conversation

colesbury
Copy link
Contributor

@colesbury colesbury commented Feb 12, 2024

This adds _PyMem_FreeDelayed() and supporting functions. The
_PyMem_FreeDelayed() function frees memory with the same allocator as
PyMem_Free(), but after some delay to ensure that concurrent lock-free
readers have finished.

…uilds

This adds `_PyMem_FreeDelayed()` and supporting functions. The
`_PyMem_FreeDelayed()` function frees memory with the same allocator as
`PyMem_Free()`, but after some delay to ensure that concurrent lock-free
readers have finished.
@colesbury colesbury force-pushed the gh-115103-delayed-free branch from a661366 to 5e9f1dc Compare February 16, 2024 20:35
@colesbury colesbury marked this pull request as ready for review February 16, 2024 20:53
Copy link
Contributor

@DinoV DinoV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty solid, let me know what you think of the comments.

process_queue(struct llist_node *head, struct _qsbr_thread_state *qsbr)
{
while (!llist_empty(head)) {
struct _mem_work_chunk *buf = work_queue_first(head);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we'e not just doing something like:
for (Py_ssize_t i = buf->rd_idx; i < buf->wr_idx; i++) {
...
}

Or the while loop like _PyMem_FiniDelayed with the polling from here added in vs just going back through the head of the list each time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the loop to avoid grabbing the head every iteration

struct _mem_work_chunk *buf = work_queue_first(head);

if (buf->rd_idx == buf->wr_idx) {
llist_remove(&buf->node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we should wait for buf->rd_idx == WORK_ITEMS_PER_CHUNK before freeing and removing?

Otherwise it seems like we could end up in a situation where we're doing some delay frees, process them all, and then quickly need to do more delayed frees and end up re-allocating this 4k buffer repeatedly.

And if this is any buffer other than the last then buf->wr_idx == WORK_ITEMS_PER_CHUNK so we'd always free all but one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good idea. We can do that for the thread queues but not the interpreter queue.

Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!
(I 've taken a look at QSBR mechanism due to working on list allocation while making list thread-safe)

Copy link
Contributor

@DinoV DinoV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@colesbury colesbury merged commit e3ad6ca into python:main Feb 20, 2024
33 checks passed
@colesbury colesbury deleted the gh-115103-delayed-free branch February 20, 2024 18:04
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this pull request Mar 4, 2024
…uilds (python#115367)

This adds `_PyMem_FreeDelayed()` and supporting functions. The
`_PyMem_FreeDelayed()` function frees memory with the same allocator as
`PyMem_Free()`, but after some delay to ensure that concurrent lock-free
readers have finished.
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024
…uilds (python#115367)

This adds `_PyMem_FreeDelayed()` and supporting functions. The
`_PyMem_FreeDelayed()` function frees memory with the same allocator as
`PyMem_Free()`, but after some delay to ensure that concurrent lock-free
readers have finished.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants