diff --git a/doc/main/tbb_userguide/How_Task_Scheduler_Works.rst b/doc/main/tbb_userguide/How_Task_Scheduler_Works.rst new file mode 100644 index 0000000000..5ad1670baa --- /dev/null +++ b/doc/main/tbb_userguide/How_Task_Scheduler_Works.rst @@ -0,0 +1,50 @@ +.. _How_Task_Scheduler_Works.rst: + +How Task Scheduler Works +======================== + + +While the task scheduler is not bound to any particular type of parallelism, +it was designed to work efficiently for fork-join parallelism with lots of forks. +This type of parallelism is typical for parallel algorithms such as `oneapi::tbb::parallel_for +`_. + +Let's consider the mapping of fork-join parallelism on the task scheduler in more detail. + +The scheduler runs tasks in a way that tries to achieve several targets simultaneously: + - Enable as many threads as possible, by creating enough job, to achieve actual parallelism + - Preserve data locality to make a single thread execution more efficient + - Minimize both memory demands and cross-thread communication to reduce an overhead + +To achieve this, a balance between depth-first and breadth-first execution strategies +must be reached. Assuming that the task graph is finite, depth-first is better for +a sequential execution because: + +- **Strike when the cache is hot**. The deepest tasks are the most recently created tasks and therefore are the hottest in the cache. + Also, if they can be completed, tasks that depend on it can continue executing, and though not the hottest in a cache, + they are still warmer than the older tasks deeper in the dequeue. + +- **Minimize space**. Execution of the shallowest task leads to the breadth-first unfolding of a graph. It creates an exponential + number of nodes that co-exist simultaneously. In contrast, depth-first execution creates the same number + of nodes, but only a linear number can exists at the same time, since it creates a stack of other ready + tasks. + +Each thread has its deque of tasks that are ready to run. When a +thread spawns a task, it pushes it onto the bottom of its deque. + +When a thread participates in the evaluation of tasks, it constantly executes +a task obtained by the first rule that applies from the roughly equivalent ruleset: + +- Get the task returned by the previous one, if any. + +- Take a task from the bottom of its deque, if any. + +- Steal a task from the top of another randomly chosen deque. If the + selected deque is empty, the thread tries again to execute this rule until it succeeds. + +Rule 1 is described in :doc:`Task Scheduler Bypass `. +The overall effect of rule 2 is to execute the *youngest* task spawned by the thread, +which causes the depth-first execution until the thread runs out of work. +Then rule 3 applies. It steals the *oldest* task spawned by another thread, +which causes temporary breadth-first execution that converts potential parallelism +into actual parallelism. diff --git a/doc/main/tbb_userguide/Task_Scheduler_Bypass.rst b/doc/main/tbb_userguide/Task_Scheduler_Bypass.rst new file mode 100644 index 0000000000..c198f6ac6b --- /dev/null +++ b/doc/main/tbb_userguide/Task_Scheduler_Bypass.rst @@ -0,0 +1,20 @@ +.. _Task_Scheduler_Bypass: + +Task Scheduler Bypass +===================== + +Scheduler bypass is an optimization where you directly specify the next task to run. +According to the rules of execution described in :doc:`How Task Scheduler Works `, +the spawning of the new task to be executed by the current thread involves the next steps: + + - Push a new task onto the thread's deque. + - Continue to execute the current task until it is completed. + - Take a task from the thread's deque, unless it is stolen by another thread. + +Steps 1 and 3 introduce unnecessary deque operations or, even worse, allow stealing that can hurt +locality without adding significant parallelism. These problems can be avoided by using "Task Scheduler Bypass" technique to directly point the preferable task to be executed next +instead of spawning it. When, as described in :doc:`How Task Scheduler Works `, +the returned task becomes the first candidate for the next task to be executed by the thread. Furthermore, this approach almost guarantees that +the task is executed by the current thread and not by any other thread. + +Please note that at the moment the only way to use this optimization is to use `preview feature of ``onepai::tbb::task_group`` \ No newline at end of file diff --git a/doc/main/tbb_userguide/Task_Scheduler_Summary.rst b/doc/main/tbb_userguide/Task_Scheduler_Summary.rst deleted file mode 100644 index 7c77d8f9df..0000000000 --- a/doc/main/tbb_userguide/Task_Scheduler_Summary.rst +++ /dev/null @@ -1,11 +0,0 @@ -.. _Task_Scheduler_Summary: - -Task Scheduler Summary -====================== - - -The task scheduler works most efficiently for fork-join parallelism with -lots of forks, so that the task-stealing can cause sufficient -breadth-first behavior to occupy threads, which then conduct themselves -in a depth-first manner until they need to steal more work. - diff --git a/doc/main/tbb_userguide/The_Task_Scheduler.rst b/doc/main/tbb_userguide/The_Task_Scheduler.rst index ac57e02037..d9e6056e1c 100644 --- a/doc/main/tbb_userguide/The_Task_Scheduler.rst +++ b/doc/main/tbb_userguide/The_Task_Scheduler.rst @@ -16,5 +16,6 @@ onto one of the high-level templates, use the task scheduler. ../tbb_userguide/Task-Based_Programming ../tbb_userguide/When_Task-Based_Programming_Is_Inappropriate - ../tbb_userguide/Task_Scheduler_Summary + ../tbb_userguide/How_Task_Scheduler_Works + ../tbb_userguide/Task_Scheduler_Bypass ../tbb_userguide/Guiding_Task_Scheduler_Execution \ No newline at end of file