Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task: illumos/Solaris have thread-local weirdness #6777

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Aug 14, 2024

Motivation

In tokio::task::local, there's a
LocalState::assert_called_from_owner_thread function, which checks
that the caller's thread ID matches that of the thread on which the
LocalSet was created. This assertion is disabled on FreeBSD and
OpenBSD, with a comment indicating that this is because those systems
have "some weirdness around thread-local destruction". If memory serves,
I believe the "weirdness" was related to the order in which
thread-locals are destroyed relative to each other, or something along
those lines.

Upon running the tests on illumos, there appears to be a similar
panic
in the task_local_set_in_thread_local due to this assertion
assertion while dropping a LocalSet which lives in a thread-local.
This leads me to believe that illumos, and presumably Solaris, also
exhibit the same "weirdness". This wouldn't be too surprising, given the
shared heritage of those systems.

Solution

This commit adds target_os="illumos" and target_os="solaris" to the
cfg attribute that disables the assertion on systems determined to
have weirdness. This fixes the test panicking on illumos.

In the future, it might be worthwhile to look into changign the
assertion to only be disabled on weirdness-having systems while
dropping the LocalSet
, rather than all the time. That way, we can
still check other accesses. But, I wanted to make the minimum change
necessary to fix it here before messing around with that.

This change was cherry-picked from PR #6769.

@hawkw hawkw self-assigned this Aug 14, 2024
@Darksonn Darksonn added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime labels Aug 15, 2024
tokio/src/macros/select.rs Outdated Show resolved Hide resolved
Comment on lines -1163 to +1166
// FreeBSD has some weirdness around thread-local destruction.
// BSDs, illumos, and Solaris have some weirdness around thread-local
// destruction.
// TODO: remove this hack when thread id is cleaned up
#[cfg(not(any(target_os = "openbsd", target_os = "freebsd")))]
#[cfg(not(any(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the actual code looks like this:

            context::thread_id()
                .map(|id| id == self.owner)
                .unwrap_or(true),

here it's using map to handle the case where getting the thread local fails. Not sure why these platforms fail in some other way. Anyway, since we already need this for other platforms, it doesn't seem like an issue to add more platforms here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if memory serves, the problem is actually that on BSDs (and apparently, illumos/Solaris), getting the thread local apparently doesn't fail as we would expect, and it instead gets a new value or something. I vaguely recall @carllerche trying to figure this out while working on #5179, and I think we couldn't really figure out what was going on there and kinda gave up...

## Motivation

In `tokio::task::local`, there's a
`LocalState::assert_called_from_owner_thread` function, which checks
that the caller's thread ID matches that of the thread on which the
`LocalSet` was created. This assertion is disabled on FreeBSD and
OpenBSD, with a comment indicating that this is because those systems
have "some weirdness around thread-local destruction". If memory serves,
I believe the "weirdness" was related to the order in which
thread-locals are destroyed relative to each other, or something along
those lines.

Upon running the tests on illumos, there appears to be [a similar
panic][1] in the `task_local_set_in_thread_local` due to this assertion
assertion while dropping a `LocalSet` which lives in a thread-local.
This leads me to believe that illumos, and presumably Solaris, also
exhibit the same "weirdness". This wouldn't be too surprising, given the
shared heritage of those systems.

## Solution

This commit adds `target_os="illumos"` and `target_os="solaris"` to the
`cfg` attribute that disables the assertion on systems determined to
have weirdness. This fixes the test panicking on illumos.

In the future, it might be worthwhile to look into changign the
assertion to only be disabled on weirdness-having systems _while
dropping the `LocalSet`_, rather than all the time. That way, we can
still check other accesses. But, I wanted to make the minimum change
necessary to fix it here before messing around with that.

[1]: https://buildomat.eng.oxide.computer/wg/0/details/01J592RB0JR203RGGN0RYHJHMY/CHieo1Uee7qzRVyp811YBl0MvXGO3i0QA9ezPaFWj6hf6P3P/01J592RSWCNX1MCFYGW74AHVH6#S1954
@hawkw hawkw changed the base branch from eliza/illumos-ci to master August 15, 2024 16:05
@Darksonn
Copy link
Contributor

Looks like something is wrong with CI. You may have to push something to poke it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants