-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smp: prefaulter: don't leave zombie worker threads #2679
base: master
Are you sure you want to change the base?
Conversation
As explained in scylladb#2623 in detail, the prefaulter worker threads that have completed but not joined are left in a zombie state, which confuses gdb thread_local processing. As seastar relies on thread locals heavily, it becomes impossible to debug core dumps. Fix this by joining the threads after they complete. Use seastar::alien to ask the main reactor threads to join the completed threads when they are done, so it won't stall. Fixes scylladb#2623.
work(ranges, page_size, huge_page_size_opt); | ||
if (!--_active_threads) { | ||
run_on(alien, 0, [this] () noexcept { join_threads(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no risk that this join_threads()
call will happen after the memory_prefaulter
is already destroyed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the prefaulter is nested under smp, and so are the reactors. So if the reactor is still running, the prefaulter still exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but's it's not nice to have implicit lifetime assumptions like that.
Not that this case is worth caring about, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In such cases it is unavoidable without strong compiler support. The setup is too hairy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
work(ranges, page_size, huge_page_size_opt); | ||
if (!--_active_threads) { | ||
run_on(alien, 0, [this] () noexcept { join_threads(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but's it's not nice to have implicit lifetime assumptions like that.
Not that this case is worth caring about, though.
As explained in #2623 in detail, the prefaulter worker threads that have completed but not joined are left in a zombie state, which confuses gdb thread_local processing. As seastar relies on thread locals heavily, it becomes impossible to debug core dumps.
Fix this by joining the threads after they complete. Use seastar::alien to ask the main reactor threads to join the completed threads when they are done, so it won't stall.
Fixes #2623.