-
Notifications
You must be signed in to change notification settings - Fork 137
Architecture
This page is intended to describe the architecture.
It describes where we are at now and where we'd like to be.
(Note from Will: I'm collecting various snippets of e-mails and notes about various parts of the kernel and programming patterns / coding style used along the way. This will be a jumbled mess until ironed out into either documentation or whitepaper format. And much of this will be in flux, anyway, but it should help one get up and going without having to repeat things on the list, IRC, issues or PR...)
We don't ever plan on supporting fork() and execve() and their cousins. Nanos is by nature a single process system. It should be able to utilize multiple threads but it should be under the constraint that it's only intended to run one 'program'. So there's never a case where you'd fork a new process and exec a new one. This design paradigm descends from RTOS's and other microkernels.
e-mail from original author:
each suspended thread has a context, which is its register file. use of the 'current' global arises from the maybe misplaced desire to have compatible syscall function signatures. in any case, when the syscall is started, 'current' is the context associated with the thread that made the call.
thread_wakeup() takes any context and puts in on the run queue.
when a thread is actually run, it ends up in x86_64/crt0.s:frame_return. this restores the live processor context from the canned context.
the part thats maybe particularily confusing is that (in order to try to preserve user-like signatures), the actual syscall trap function (x86_64/crt0.s:sycall_enter) will take the value of a normal return (in rax), and put in into the context before calling frame_enter. in reality, because syscalls dont assume everything is saved this path could be somewhat faster. so we have 2 cases:
direct return - the syscall function gets dispatched, and whatever it returns gets returned to the user
deferred return - the syscall enters runloop(). this discards the entire call context from the point of the syscall trap. the current thread needs to be stashed somewhere to be restores (likely in a closure). the return value from the syscall goes in RAX slot of the context. when its time to actually return, put the t->run closure on the run queue
The general pattern with regards to blocking is to pack the bottom half of the operation (that is to say, the part that should occur after waking) into a closure with the necessary environment, e.g. thread, file, socket, buffer, offset, length and whatever else required to finish the asynchronous operation, enclosed as a unit. This is accomplished with a call to closure() which returns a pointer to such a bundle. Typically, this bottom half, as a bundle, is squirreled away somewhere (e.g. the socket has such a "waiting" queue for operations blocked on the arrival of data (or connection in the case of a listen socket) on the incoming queue, and the filesystem stuff packs the completion into a merge, which is basically a refcount with an action (the completion) to be taken when the count returns to zero) to be invoked later using apply(). The invoking of such a closure may add parameters, such as a status from LWIP, to be passed through apply. So when you see CLOSURE_3_2, for instance, it means that three parameters (the left hand side) are packed in with creation of the closure - before blocking - and two parameters (the right hand side) are passed with the apply upon wakeup.
Recap:
-
thread_sleep() never returns, because the runloop never returns. If you call it, you better have a bottom half registered somewhere that will get executed when the I/O or otherwise blocking operation completes. The closure is just a convenient device for taking a single operation and breaking it into before-blocking (or direct return) and after-blocking parts while enclosing the necessary environment to finish the operation.
-
thread_wakeup() enqueues a closure of run_thread() packed with a pointer to the thread struct to the runqueue, to be serviced by runloop() (e.g. another thread blocking...also some changes coming here but will discuss later). It does return, and the awoken thread won't execute until the next pass through the runloop.