-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-115103: Implement delayed free mechanism for free-threaded builds #115367
Conversation
…uilds This adds `_PyMem_FreeDelayed()` and supporting functions. The `_PyMem_FreeDelayed()` function frees memory with the same allocator as `PyMem_Free()`, but after some delay to ensure that concurrent lock-free readers have finished.
a661366
to
5e9f1dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty solid, let me know what you think of the comments.
process_queue(struct llist_node *head, struct _qsbr_thread_state *qsbr) | ||
{ | ||
while (!llist_empty(head)) { | ||
struct _mem_work_chunk *buf = work_queue_first(head); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we'e not just doing something like:
for (Py_ssize_t i = buf->rd_idx; i < buf->wr_idx; i++) {
...
}
Or the while loop like _PyMem_FiniDelayed
with the polling from here added in vs just going back through the head of the list each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the loop to avoid grabbing the head every iteration
Objects/obmalloc.c
Outdated
struct _mem_work_chunk *buf = work_queue_first(head); | ||
|
||
if (buf->rd_idx == buf->wr_idx) { | ||
llist_remove(&buf->node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we should wait for buf->rd_idx == WORK_ITEMS_PER_CHUNK
before freeing and removing?
Otherwise it seems like we could end up in a situation where we're doing some delay frees, process them all, and then quickly need to do more delayed frees and end up re-allocating this 4k buffer repeatedly.
And if this is any buffer other than the last then buf->wr_idx == WORK_ITEMS_PER_CHUNK
so we'd always free all but one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good idea. We can do that for the thread queues but not the interpreter queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
(I 've taken a look at QSBR mechanism due to working on list allocation while making list thread-safe)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…uilds (python#115367) This adds `_PyMem_FreeDelayed()` and supporting functions. The `_PyMem_FreeDelayed()` function frees memory with the same allocator as `PyMem_Free()`, but after some delay to ensure that concurrent lock-free readers have finished.
…uilds (python#115367) This adds `_PyMem_FreeDelayed()` and supporting functions. The `_PyMem_FreeDelayed()` function frees memory with the same allocator as `PyMem_Free()`, but after some delay to ensure that concurrent lock-free readers have finished.
This adds
_PyMem_FreeDelayed()
and supporting functions. The_PyMem_FreeDelayed()
function frees memory with the same allocator asPyMem_Free()
, but after some delay to ensure that concurrent lock-freereaders have finished.