Skip to content

Commit

Permalink
TBB DOC : Dev Guide: Task Scheduler Bypass and How Does Task Schedule…
Browse files Browse the repository at this point in the history
…r Works (uxlfoundation#521)

* TBB DOC : Dev Guide: Task Scheduler Bypass and How Task Scheduler
Works

Signed-off-by: Anton Potapov <anton.potapov@intel.com>
Co-authored-by: Alexandra <alexandra.epanchinzeva@intel.com>
  • Loading branch information
anton-potapov and aepanchi authored Mar 21, 2022
1 parent 0a0a592 commit ed9d4b5
Show file tree
Hide file tree
Showing 4 changed files with 72 additions and 12 deletions.
50 changes: 50 additions & 0 deletions doc/main/tbb_userguide/How_Task_Scheduler_Works.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
.. _How_Task_Scheduler_Works.rst:

How Task Scheduler Works
========================


While the task scheduler is not bound to any particular type of parallelism,
it was designed to work efficiently for fork-join parallelism with lots of forks.
This type of parallelism is typical for parallel algorithms such as `oneapi::tbb::parallel_for
<https://spec.oneapi.io/versions/latest/elements/oneTBB/source/algorithms/functions/parallel_for_func.html>`_.

Let's consider the mapping of fork-join parallelism on the task scheduler in more detail.

The scheduler runs tasks in a way that tries to achieve several targets simultaneously:
- Enable as many threads as possible, by creating enough job, to achieve actual parallelism
- Preserve data locality to make a single thread execution more efficient
- Minimize both memory demands and cross-thread communication to reduce an overhead

To achieve this, a balance between depth-first and breadth-first execution strategies
must be reached. Assuming that the task graph is finite, depth-first is better for
a sequential execution because:

- **Strike when the cache is hot**. The deepest tasks are the most recently created tasks and therefore are the hottest in the cache.
Also, if they can be completed, tasks that depend on it can continue executing, and though not the hottest in a cache,
they are still warmer than the older tasks deeper in the dequeue.

- **Minimize space**. Execution of the shallowest task leads to the breadth-first unfolding of a graph. It creates an exponential
number of nodes that co-exist simultaneously. In contrast, depth-first execution creates the same number
of nodes, but only a linear number can exists at the same time, since it creates a stack of other ready
tasks.

Each thread has its deque of tasks that are ready to run. When a
thread spawns a task, it pushes it onto the bottom of its deque.

When a thread participates in the evaluation of tasks, it constantly executes
a task obtained by the first rule that applies from the roughly equivalent ruleset:

- Get the task returned by the previous one, if any.

- Take a task from the bottom of its deque, if any.

- Steal a task from the top of another randomly chosen deque. If the
selected deque is empty, the thread tries again to execute this rule until it succeeds.

Rule 1 is described in :doc:`Task Scheduler Bypass <Task_Scheduler_Bypass>`.
The overall effect of rule 2 is to execute the *youngest* task spawned by the thread,
which causes the depth-first execution until the thread runs out of work.
Then rule 3 applies. It steals the *oldest* task spawned by another thread,
which causes temporary breadth-first execution that converts potential parallelism
into actual parallelism.
20 changes: 20 additions & 0 deletions doc/main/tbb_userguide/Task_Scheduler_Bypass.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
.. _Task_Scheduler_Bypass:

Task Scheduler Bypass
=====================

Scheduler bypass is an optimization where you directly specify the next task to run.
According to the rules of execution described in :doc:`How Task Scheduler Works <How_Task_Scheduler_Works>`,
the spawning of the new task to be executed by the current thread involves the next steps:

- Push a new task onto the thread's deque.
- Continue to execute the current task until it is completed.
- Take a task from the thread's deque, unless it is stolen by another thread.

Steps 1 and 3 introduce unnecessary deque operations or, even worse, allow stealing that can hurt
locality without adding significant parallelism. These problems can be avoided by using "Task Scheduler Bypass" technique to directly point the preferable task to be executed next
instead of spawning it. When, as described in :doc:`How Task Scheduler Works <How_Task_Scheduler_Works>`,
the returned task becomes the first candidate for the next task to be executed by the thread. Furthermore, this approach almost guarantees that
the task is executed by the current thread and not by any other thread.

Please note that at the moment the only way to use this optimization is to use `preview feature of ``onepai::tbb::task_group``
11 changes: 0 additions & 11 deletions doc/main/tbb_userguide/Task_Scheduler_Summary.rst

This file was deleted.

3 changes: 2 additions & 1 deletion doc/main/tbb_userguide/The_Task_Scheduler.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,6 @@ onto one of the high-level templates, use the task scheduler.

../tbb_userguide/Task-Based_Programming
../tbb_userguide/When_Task-Based_Programming_Is_Inappropriate
../tbb_userguide/Task_Scheduler_Summary
../tbb_userguide/How_Task_Scheduler_Works
../tbb_userguide/Task_Scheduler_Bypass
../tbb_userguide/Guiding_Task_Scheduler_Execution

0 comments on commit ed9d4b5

Please sign in to comment.