Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread creation with race-free access to the TCB #17542

Open
chrysn opened this issue Jan 20, 2022 · 12 comments
Open

Thread creation with race-free access to the TCB #17542

chrysn opened this issue Jan 20, 2022 · 12 comments
Labels
Type: new feature The issue requests / The PR implemements a new feature for RIOT

Comments

@chrysn
Copy link
Member

chrysn commented Jan 20, 2022

Description

Right now, when creating a thread, it is hard to reliably get the child's TCB:

One could start the thread sleeping, maybe that's a sufficient workaround.
One could read obtain the TCB pointer right after start, check if it's within the stack that was provided.
One could use IPC to send the TCB of the child to the parent.

Otherwise, it's hard to distinguish between cases of the child running, and the child having died real quick and another task having been created with the same PID.

Why is it important to obtain the TCB? Without the TCB, it's hard to be certain the task stopped. Sure, if the PID is gone it's sure that the task has stopped (especially as it'd be very impolite to reap someone else's zombie children), but if the PID is still around, the parent might be misled into thinking the child still lives.

This is all more important when the parent must be (under penalty of UB) sure that the child is done for good with its stack before the memory can be used (as is the case when spawning threads from Rust), but (at least in theory) affects C too with similar considerations as in #17502.

Possible ways forward

  1. State that creating a sleeping thread to be woken is cheap enough in terms of code size that "sleep, get, wake" is viable.
  2. Add an API for thread creation that returns the TCB. The existing API could become a static inline wrapper.

This is not all that easy because thread_create may also return an error.

An alternative (also with a new function shimmable to the old) is to provide a **thread_t that gets populated unless NULL.

I'm originally not a fan of doing 1 (and the pointer comparison seems too ugly even though it's what riot-wrappers are doing right now) because I want the Rust wrappers to be zero-cost. But having played around a bit with 2, I'm questioning whether adding the API wouldn't just add the one more indirection wouldn't just add the same few byte to the builds that 1 adds. Really, what is the benefit of having zero-cost anything if to get there you have to raise the cost for everyone (who, realistically speaking, usually don't start and stop tasks all the time) rather than biting down on the small cost yourself?

Still, before committing to that (and closing the issue with "1 is right"), I'd like to hear others' opinions, maybe I missed use cases.

Related discussions

Using TCB pointers over PIDs was discussed in #1004; not done at least partially because you can pack a PID and extra data (like a flag to be set or message type) in a void *arg for callbacks, and can be multiplexed also with a negative integer.

Ages ago, in #1298 there was discussion of having the TCB separate from the stack pool for alignment reasons. Any such thing would break the"is the TCB really where I provided stack" trick, but would also come with its own way of getting the TCB in or out, probably. Either way, AIU that all didn't explain where the TCB would be allocated. If it were to come from a pool (like PIDs do), threads that want to be finished properly by their parents would need to go to zombie state (not a bad thing, really, but not sufficient on its own).

If a parent wants to wait for a child to stop, we don't support that, and there's no need to -- the child could just as well use IPC to signal to the parent that it's now ready to be collected from its Stopped state (which the parent can see because it has a TCB pointer; not having the TCB pointer going to zombie mode is insufficient because still the zombie could be someone else's child, and then how would anyone know how to reap them). Due to races and priorities the parent can't immediately know that the child is done, but thanks to #17093 it can raise the child's priority to its own (and then yield) while the child is not stopped.

@maribu
Copy link
Member

maribu commented Sep 19, 2022

How about asking the caller of thread_create() to allocate a thread_t instead? Every platform specific implementation of thread_create() now does a lot of wiggling to align the stack and allocate a thread_t in it. And the moment additional space is needed in thread_t, stack overflows can appear were previously were none.

@kaspar030
Copy link
Contributor

How about asking the caller of thread_create() to allocate a thread_t instead?

IMO that would just require two allocations instead of one: thread_t and stack itself. I know there's stack wiggling, but it's straight forward, maybe just a bit awkward implementation wise with room for improvement.
We might rather want to unify this wiggling (e.g., first chop off thread_t from the stack in a shared function, then initialize the stack platform-specific. With some care, the platform specific code could just assert the necessary alignments, because the shared function handles that.

Right now, when creating a thread, it is hard to reliably get the child's TCB

I think "sleep, get, wake" is totally viable. So would be some API that returns the thread_t* (or error). Can the second be implemented with the first?

@chrysn
Copy link
Member Author

chrysn commented Sep 19, 2022

If we make the wiggling public (given this stack area, tell me where the wiggling would place the thread_t), I think this would suffice. The mechanism could even be private.

If two allocations are bad (it's not usually on the heap, so IDC too much), it might be an option to create a type that is passed in instead of a uint8_t that has the right alignment. (Like, making the last member of the thread_t struct a uint32_t stack[] or whatever is required -- not sure this works out fully in C)

What I don't like too much about sleep-get-wake is that it's not the default, and that the default is presumably more efficient; I'd prefer Rust wrappers not to add cost just to gain the safety benefits.

Can the [something returning thread_t* on creation] be implemented with [sleep, get, wake]?

Yes, it can -- but given that it takes extra steps it's not my preferred option.


Right now, my preferred solution would be a public "get TCB from uint8_t[] stack" function. That would be accessible to the optimizer.

@maribu
Copy link
Member

maribu commented Sep 19, 2022

I don't see the problem with

    thread_t thread;
    uint8_t stack[FOO];
    thread_create(&thread, stack, sizeof(stack), ...);

It would make also debugging much easier. Right now, all thread_t * show as stack_foo + X in GDB, but with that they would show as thread_foo instead. And I easily could examine thread states from GDB even when not using OpenOCD, which now requires noting the address of the thread_t *.

@chrysn
Copy link
Member Author

chrysn commented Sep 19, 2022

uint8_t stack[FOO];

If we go there, that should be aligned right away as well.

Provided the stack wiggling is well enough understood to be public (even if the implementation varies by platform), then both APIs can be provided, with the one that takes a thread-and-a-stack as the main work horse and the other going through wiggling to split the unaligned blob into the two parts.

@maribu
Copy link
Member

maribu commented Sep 19, 2022

If we go there, that should be aligned right away as well.

I strongly agree. Also @benpicco was in favor, but @kaspar030 shot that idea down.

@chrysn
Copy link
Member Author

chrysn commented Sep 19, 2022

That was the discussion in #1298, whose last two comments I read as "there was offline discussion". The keyword there was futureproofing -- and I do see that requirements of exotic platforms might be exotic. (They might be so exotic that even the provided u8 array is nothing to forge a thread stack from, hello WASM, but that's a different affair. But at least weirdo platforms like "my stack has uint16_t growing down, and doubles growing up", not that I'd know such a platform, can be accommodated with the current API).

So considering previous discussion the evolution-not-revolution step here would be to make the way from u8-stack-to-structured public, and build on that.

@kaspar030
Copy link
Contributor

I strongly agree. Also @benpicco was in favor, but @kaspar030 shot that idea down.

I changed my mind on that one :/

@kaspar030
Copy link
Contributor

I don't see the problem with

That would change to sth like

struct thread_info {
thread_t thread;
char stack[MY_MODULE_STACK_SIZE];
}
int mymodule_init() {
 for (i=0; i<MYMODULE_INSTANCES; i++) {
 ...
 }
}

But more importantly, as long as we hand out PIDs and they can be checked for validity, it ensures a bit more that the pointed to stack (and thread_t) are valid. Otherwise, that's up to the user to ensure.

@kaspar030
Copy link
Contributor

What I don't like too much about sleep-get-wake is that it's not the default, and that the default is presumably more efficient; I'd prefer Rust wrappers not to add cost just to gain the safety benefits.

Hm, even if it's some lines more, the bulk of thread creation is elsewhere. Performance wise I doubt it would matter, even if threads would be started in a loop (instead of a couple times at boot). I mean, we're talking about a three line wrapper, right?

@chrysn
Copy link
Member Author

chrysn commented Sep 19, 2022

as long as we hand out PIDs and they can be checked for validity, it ensures a bit more that the pointed to stack (and thread_t) are valid

The PIDs have these checks more, but they lack one significant other check: "Is this still the thread I meant, or has that crashed, stopped, and the PID reassigned?" Given we don't have a waiting-to-be-reaped state that'd need to be wait()ed for by the parent or any other responsible party and only then allows reassigning of the PID, the PID may already be in use by someone else. The TCB doesn't have that issue, as long as one is in agreement with the owner of the TCB that the TCB won't be repurposed until all references to it have been returned.

I mean, we're talking about a three line wrapper, right?

I've looked at the code where I'm doing that right now again, and it's two things:

  • For one, the three-line-wrapper. Not a big thing, just deduction on style points where I'm promising that Rust wrappers are zero-cost, and then the assembly has two lines more. Really more an Ego thing, let's disregard that one.

  • Raciness against rogue wakings. If some other (high prirority) thread or interrupt just hands out thread_wakeup()s inside these three lines, all of a sudden the thread creator might be looking at the wrong TCB. Granted, just waking up anyone is not really a thing you should do -- but still, this three-line workaround would elevate something "you just shouldn't do, but it's through PIDs and OS functions so at least it should be safe, right" into something that can lead to undefined behavior. It's unlikely to happen, but still it's a soundness blemish, and maybe it's just that kind of thing that becomes part of an exploit chain some day.

    (I've installed stern words in the message subsystem saying that you really can't go around willy-nilly sending messages to unexpecting threads, and maybe I'd feel better about this if an analogous warning were placed around sleep/wakeup, but still...)

@chrysn
Copy link
Member Author

chrysn commented Oct 13, 2022

The solution for the purpose of having a handle that can be sure to never mean any different than thread than the one just created is, to me, solved with this. (Given the TCB may now be repurposed just as the thread ID is, I'd need to do a different check, but "is the sp of the process still in the stack I gave it" should be a suitable criterion).

Its actionable result is tracked in RIOT-OS/rust-riot-wrappers#23.

There are different issues that were touched in and around this issue ("how would a thread creator that knows its entry function's stack requirements well know how much more than the required space to allocate?", "can a thread block on the completion of another one?", "how do you get the return value of a thread?"), but those probably deserve their own issues.

@maribu maribu added the Type: new feature The issue requests / The PR implemements a new feature for RIOT label May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: new feature The issue requests / The PR implemements a new feature for RIOT
Projects
None yet
Development

No branches or pull requests

3 participants