From 3468b4b72f4e5a5ae89b1ff059ccde576dceb459 Mon Sep 17 00:00:00 2001 From: Alice Ryhl Date: Wed, 29 Nov 2023 12:32:58 +0100 Subject: [PATCH] runtime: document fairness guarantees and current behavior (#6145) --- tokio/src/runtime/builder.rs | 8 +- tokio/src/runtime/mod.rs | 140 ++++++++++++++++++++++++++++++++ tokio/src/runtime/task/abort.rs | 3 + tokio/src/runtime/task/error.rs | 4 + tokio/src/runtime/task/join.rs | 4 + tokio/src/sync/mutex.rs | 8 ++ tokio/src/task/mod.rs | 40 +++++++++ 7 files changed, 203 insertions(+), 4 deletions(-) diff --git a/tokio/src/runtime/builder.rs b/tokio/src/runtime/builder.rs index fabafd1033d..37d7cfe8f0b 100644 --- a/tokio/src/runtime/builder.rs +++ b/tokio/src/runtime/builder.rs @@ -746,10 +746,8 @@ impl Builder { /// /// A scheduler "tick" roughly corresponds to one `poll` invocation on a task. /// - /// By default the global queue interval is: - /// - /// * `31` for the current-thread scheduler. - /// * `61` for the multithreaded scheduler. + /// By default the global queue interval is 31 for the current-thread scheduler. Please see + /// [the module documentation] for the default behavior of the multi-thread scheduler. /// /// Schedulers have a local queue of already-claimed tasks, and a global queue of incoming /// tasks. Setting the interval to a smaller value increases the fairness of the scheduler, @@ -758,6 +756,8 @@ impl Builder { /// or await on further I/O. Conversely, a higher value prioritizes existing work, and /// is a good choice when most tasks quickly complete polling. /// + /// [the module documentation]: crate::runtime#multi-threaded-runtime-behavior-at-the-time-of-writing + /// /// # Examples /// /// ``` diff --git a/tokio/src/runtime/mod.rs b/tokio/src/runtime/mod.rs index e3369cb2249..3d333960f3d 100644 --- a/tokio/src/runtime/mod.rs +++ b/tokio/src/runtime/mod.rs @@ -172,6 +172,146 @@ //! [`Builder::enable_io`]: crate::runtime::Builder::enable_io //! [`Builder::enable_time`]: crate::runtime::Builder::enable_time //! [`Builder::enable_all`]: crate::runtime::Builder::enable_all +//! +//! # Detailed runtime behavior +//! +//! This section gives more details into how the Tokio runtime will schedule +//! tasks for execution. +//! +//! At its most basic level, a runtime has a collection of tasks that need to be +//! scheduled. It will repeatedly remove a task from that collection and +//! schedule it (by calling [`poll`]). When the collection is empty, the thread +//! will go to sleep until a task is added to the collection. +//! +//! However, the above is not sufficient to guarantee a well-behaved runtime. +//! For example, the runtime might have a single task that is always ready to be +//! scheduled, and schedule that task every time. This is a problem because it +//! starves other tasks by not scheduling them. To solve this, Tokio provides +//! the following fairness guarantee: +//! +//! > If the total number of tasks does not grow without bound, and no task is +//! > [blocking the thread], then it is guaranteed that tasks are scheduled +//! > fairly. +//! +//! Or, more formally: +//! +//! > Under the following two assumptions: +//! > +//! > * There is some number `MAX_TASKS` such that the total number of tasks on +//! > the runtime at any specific point in time never exceeds `MAX_TASKS`. +//! > * There is some number `MAX_SCHEDULE` such that calling [`poll`] on any +//! > task spawned on the runtime returns within `MAX_SCHEDULE` time units. +//! > +//! > Then, there is some number `MAX_DELAY` such that when a task is woken, it +//! > will be scheduled by the runtime within `MAX_DELAY` time units. +//! +//! (Here, `MAX_TASKS` and `MAX_SCHEDULE` can be any number and the user of +//! the runtime may choose them. The `MAX_DELAY` number is controlled by the +//! runtime, and depends on the value of `MAX_TASKS` and `MAX_SCHEDULE`.) +//! +//! Other than the above fairness guarantee, there is no guarantee about the +//! order in which tasks are scheduled. There is also no guarantee that the +//! runtime is equally fair to all tasks. For example, if the runtime has two +//! tasks A and B that are both ready, then the runtime may schedule A five +//! times before it schedules B. This is the case even if A yields using +//! [`yield_now`]. All that is guaranteed is that it will schedule B eventually. +//! +//! Normally, tasks are scheduled only if they have been woken by calling +//! [`wake`] on their waker. However, this is not guaranteed, and Tokio may +//! schedule tasks that have not been woken under some circumstances. This is +//! called a spurious wakeup. +//! +//! ## IO and timers +//! +//! Beyond just scheduling tasks, the runtime must also manage IO resources and +//! timers. It does this by periodically checking whether there are any IO +//! resources or timers that are ready, and waking the relevant task so that +//! it will be scheduled. +//! +//! These checks are performed periodically between scheduling tasks. Under the +//! same assumptions as the previous fairness guarantee, Tokio guarantees that +//! it will wake tasks with an IO or timer event within some maximum number of +//! time units. +//! +//! ## Current thread runtime (behavior at the time of writing) +//! +//! This section describes how the [current thread runtime] behaves today. This +//! behavior may change in future versions of Tokio. +//! +//! The current thread runtime maintains two FIFO queues of tasks that are ready +//! to be scheduled: the global queue and the local queue. The runtime will prefer +//! to choose the next task to schedule from the local queue, and will only pick a +//! task from the global queue if the local queue is empty, or if it has picked +//! a task from the local queue 31 times in a row. The number 31 can be +//! changed using the [`global_queue_interval`] setting. +//! +//! The runtime will check for new IO or timer events whenever there are no +//! tasks ready to be scheduled, or when it has scheduled 61 tasks in a row. The +//! number 61 may be changed using the [`event_interval`] setting. +//! +//! When a task is woken from within a task running on the runtime, then the +//! woken task is added directly to the local queue. Otherwise, the task is +//! added to the global queue. The current thread runtime does not use [the lifo +//! slot optimization]. +//! +//! ## Multi threaded runtime (behavior at the time of writing) +//! +//! This section describes how the [multi thread runtime] behaves today. This +//! behavior may change in future versions of Tokio. +//! +//! A multi thread runtime has a fixed number of worker threads, which are all +//! created on startup. The multi thread runtime maintains one global queue, and +//! a local queue for each worker thread. The local queue of a worker thread can +//! fit at most 256 tasks. If more than 256 tasks are added to the local queue, +//! then half of them are moved to the global queue to make space. +//! +//! The runtime will prefer to choose the next task to schedule from the local +//! queue, and will only pick a task from the global queue if the local queue is +//! empty, or if it has picked a task from the local queue +//! [`global_queue_interval`] times in a row. If the value of +//! [`global_queue_interval`] is not explicitly set using the runtime builder, +//! then the runtime will dynamically compute it using a heuristic that targets +//! 10ms intervals between each check of the global queue (based on the +//! [`worker_mean_poll_time`] metric). +//! +//! If both the local queue and global queue is empty, then the worker thread +//! will attempt to steal tasks from the local queue of another worker thread. +//! Stealing is done by moving half of the tasks in one local queue to another +//! local queue. +//! +//! The runtime will check for new IO or timer events whenever there are no +//! tasks ready to be scheduled, or when it has scheduled 61 tasks in a row. The +//! number 61 may be changed using the [`event_interval`] setting. +//! +//! The multi thread runtime uses [the lifo slot optimization]: Whenever a task +//! wakes up another task, the other task is added to the worker thread's lifo +//! slot instead of being added to a queue. If there was already a task in the +//! lifo slot when this happened, then the lifo slot is replaced, and the task +//! that used to be in the lifo slot is placed in the thread's local queue. +//! When the runtime finishes scheduling a task, it will schedule the task in +//! the lifo slot immediately, if any. When the lifo slot is used, the [coop +//! budget] is not reset. Furthermore, if a worker thread uses the lifo slot +//! three times in a row, it is temporarily disabled until the worker thread has +//! scheduled a task that didn't come from the lifo slot. The lifo slot can be +//! disabled using the [`disable_lifo_slot`] setting. The lifo slot is separate +//! from the local queue, so other worker threads cannot steal the task in the +//! lifo slot. +//! +//! When a task is woken from a thread that is not a worker thread, then the +//! task is placed in the global queue. +//! +//! [`poll`]: std::future::Future::poll +//! [`wake`]: std::task::Waker::wake +//! [`yield_now`]: crate::task::yield_now +//! [blocking the thread]: https://ryhl.io/blog/async-what-is-blocking/ +//! [current thread runtime]: crate::runtime::Builder::new_current_thread +//! [multi thread runtime]: crate::runtime::Builder::new_multi_thread +//! [`global_queue_interval`]: crate::runtime::Builder::global_queue_interval +//! [`event_interval`]: crate::runtime::Builder::event_interval +//! [`disable_lifo_slot`]: crate::runtime::Builder::disable_lifo_slot +//! [the lifo slot optimization]: crate::runtime::Builder::disable_lifo_slot +//! [coop budget]: crate::task#cooperative-scheduling +//! [`worker_mean_poll_time`]: crate::runtime::RuntimeMetrics::worker_mean_poll_time // At the top due to macros #[cfg(test)] diff --git a/tokio/src/runtime/task/abort.rs b/tokio/src/runtime/task/abort.rs index 6edca100404..410f0a4671b 100644 --- a/tokio/src/runtime/task/abort.rs +++ b/tokio/src/runtime/task/abort.rs @@ -31,8 +31,11 @@ impl AbortHandle { /// If the task was already cancelled, such as by [`JoinHandle::abort`], /// this method will do nothing. /// + /// See also [the module level docs] for more information on cancellation. + /// /// [cancelled]: method@super::error::JoinError::is_cancelled /// [`JoinHandle::abort`]: method@super::JoinHandle::abort + /// [the module level docs]: crate::task#cancellation pub fn abort(&self) { self.raw.remote_abort(); } diff --git a/tokio/src/runtime/task/error.rs b/tokio/src/runtime/task/error.rs index f7ead77b7cc..d0d731322bf 100644 --- a/tokio/src/runtime/task/error.rs +++ b/tokio/src/runtime/task/error.rs @@ -33,6 +33,10 @@ impl JoinError { } /// Returns true if the error was caused by the task being cancelled. + /// + /// See [the module level docs] for more information on cancellation. + /// + /// [the module level docs]: crate::task#cancellation pub fn is_cancelled(&self) -> bool { matches!(&self.repr, Repr::Cancelled) } diff --git a/tokio/src/runtime/task/join.rs b/tokio/src/runtime/task/join.rs index ee39258846a..818d3c21dd5 100644 --- a/tokio/src/runtime/task/join.rs +++ b/tokio/src/runtime/task/join.rs @@ -179,6 +179,8 @@ impl JoinHandle { /// already completed at the time it was cancelled, but most likely it /// will fail with a [cancelled] `JoinError`. /// + /// See also [the module level docs] for more information on cancellation. + /// /// ```rust /// use tokio::time; /// @@ -205,7 +207,9 @@ impl JoinHandle { /// } /// # } /// ``` + /// /// [cancelled]: method@super::error::JoinError::is_cancelled + /// [the module level docs]: crate::task#cancellation pub fn abort(&self) { self.raw.remote_abort(); } diff --git a/tokio/src/sync/mutex.rs b/tokio/src/sync/mutex.rs index 679627eba11..52ba2d34fcd 100644 --- a/tokio/src/sync/mutex.rs +++ b/tokio/src/sync/mutex.rs @@ -404,6 +404,10 @@ impl Mutex { /// been acquired. When the lock has been acquired, function returns a /// [`MutexGuard`]. /// + /// If the mutex is available to be acquired immediately, then this call + /// will typically not yield to the runtime. However, this is not guaranteed + /// under all circumstances. + /// /// # Cancel safety /// /// This method uses a queue to fairly distribute locks in the order they @@ -571,6 +575,10 @@ impl Mutex { /// been acquired. When the lock has been acquired, this returns an /// [`OwnedMutexGuard`]. /// + /// If the mutex is available to be acquired immediately, then this call + /// will typically not yield to the runtime. However, this is not guaranteed + /// under all circumstances. + /// /// This method is identical to [`Mutex::lock`], except that the returned /// guard references the `Mutex` with an [`Arc`] rather than by borrowing /// it. Therefore, the `Mutex` must be wrapped in an `Arc` to call this diff --git a/tokio/src/task/mod.rs b/tokio/src/task/mod.rs index 9b753701854..5dd0584338d 100644 --- a/tokio/src/task/mod.rs +++ b/tokio/src/task/mod.rs @@ -112,6 +112,46 @@ //! [thread_join]: std::thread::JoinHandle //! [`JoinError`]: crate::task::JoinError //! +//! #### Cancellation +//! +//! Spawned tasks may be cancelled using the [`JoinHandle::abort`] or +//! [`AbortHandle::abort`] methods. When one of these methods are called, the +//! task is signalled to shut down next time it yields at an `.await` point. If +//! the task is already idle, then it will be shut down as soon as possible +//! without running again before being shut down. Additionally, shutting down a +//! Tokio runtime (e.g. by returning from `#[tokio::main]`) immediately cancels +//! all tasks on it. +//! +//! When tasks are shut down, it will stop running at whichever `.await` it has +//! yielded at. All local variables are destroyed by running their detructor. +//! Once shutdown has completed, awaiting the [`JoinHandle`] will fail with a +//! [cancelled error](crate::task::JoinError::is_cancelled). +//! +//! Note that aborting a task does not guarantee that it fails with a cancelled +//! error, since it may complete normally first. For example, if the task does +//! not yield to the runtime at any point between the call to `abort` and the +//! end of the task, then the [`JoinHandle`] will instead report that the task +//! exited normally. +//! +//! Be aware that calls to [`JoinHandle::abort`] just schedule the task for +//! cancellation, and will return before the cancellation has completed. To wait +//! for cancellation to complete, wait for the task to finish by awaiting the +//! [`JoinHandle`]. Similarly, the [`JoinHandle::is_finished`] method does not +//! return `true` until the cancellation has finished. +//! +//! Calling [`JoinHandle::abort`] multiple times has the same effect as calling +//! it once. +//! +//! Tokio also provides an [`AbortHandle`], which is like the [`JoinHandle`], +//! except that it does not provide a mechanism to wait for the task to finish. +//! Each task can only have one [`JoinHandle`], but it can have more than one +//! [`AbortHandle`]. +//! +//! [`JoinHandle::abort`]: crate::task::JoinHandle::abort +//! [`AbortHandle::abort`]: crate::task::AbortHandle::abort +//! [`AbortHandle`]: crate::task::AbortHandle +//! [`JoinHandle::is_finished`]: crate::task::JoinHandle::is_finished +//! //! ### Blocking and Yielding //! //! As we discussed above, code running in asynchronous tasks should not perform