Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smp: prefaulter: don't leave zombie worker threads #2679

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

avikivity
Copy link
Member

As explained in #2623 in detail, the prefaulter worker threads that have completed but not joined are left in a zombie state, which confuses gdb thread_local processing. As seastar relies on thread locals heavily, it becomes impossible to debug core dumps.

Fix this by joining the threads after they complete. Use seastar::alien to ask the main reactor threads to join the completed threads when they are done, so it won't stall.

Fixes #2623.

As explained in scylladb#2623 in detail, the prefaulter worker threads
that have completed but not joined are left in a zombie state,
which confuses gdb thread_local processing. As seastar relies on
thread locals heavily, it becomes impossible to debug core dumps.

Fix this by joining the threads after they complete. Use seastar::alien
to ask the main reactor threads to join the completed threads when they
are done, so it won't stall.

Fixes scylladb#2623.
work(ranges, page_size, huge_page_size_opt);
if (!--_active_threads) {
run_on(alien, 0, [this] () noexcept { join_threads(); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no risk that this join_threads() call will happen after the memory_prefaulter is already destroyed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the prefaulter is nested under smp, and so are the reactors. So if the reactor is still running, the prefaulter still exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but's it's not nice to have implicit lifetime assumptions like that.
Not that this case is worth caring about, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In such cases it is unavoidable without strong compiler support. The setup is too hairy.

Copy link
Contributor

@michoecho michoecho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

work(ranges, page_size, huge_page_size_opt);
if (!--_active_threads) {
run_on(alien, 0, [this] () noexcept { join_threads(); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but's it's not nice to have implicit lifetime assumptions like that.
Not that this case is worth caring about, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

memory_prefaulter leaves a zombie pthread, confuses gdb during coredump debugging
2 participants