Skip to content

Commit 9c10514

Browse files
committed
Add futures diagram.
1 parent 880fa88 commit 9c10514

File tree

2 files changed

+23
-1
lines changed

2 files changed

+23
-1
lines changed

images/future-sequence.png

129 KB
Loading

sections/parallel-programming.qmd

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -228,12 +228,28 @@ In this case, it takes around *12 seconds* to execute 10 tasks, depending on wha
228228

229229
So, can we make this faster using a multi-threaded parallel process? Let's try with `concurrent.futures`. The main concept in this package is one of a `future`, which is a structure which represents the value that will be created in a computation in the future when the function completes execution. This creates an expression that is evaluated and will have a result available sometime in the future, but we don’t know when… it could be seconds, minutes, or hours later, depending on the task complexity and available resources. With `concurrent.futures`, tasks are scheduled and do not block while they await their turn to be executed. Instead, threads are created and executed *asynchronously*, meaning that the function returns it's `future` potentially before the thread has actually been executed. Using this approach, the user schedules a series of tasks to be executed asynchronously, and keeps track of the futures for each task. When the future indicates that the execution has been completed, we can then retrieve the result of the computation. This lets us do other useful work while our process is waiting for the future expression to finish its work.
230230

231+
```{mermaid}
232+
sequenceDiagram
233+
main->>+doubleIt: 17
234+
doubleIt-->>-main: future1
235+
main->>+doubleIt: 20
236+
doubleIt-->>-main: future2
237+
main->>+doubleIt: 16
238+
doubleIt-->>-main: future2
239+
loop for future in (future1 future2 future3)
240+
main->>+future: result()
241+
future-->>-main: value
242+
end
243+
```
244+
245+
![](../images/future-sequence.png)
246+
231247
In practice this is a simple change from our serial implementation. We will use the `ThreadPoolExecutor` to create a pool of workers that are available to process tasks. Each worker is set up in its own thread, so it can execute in parallel with other workers. After setting up the pool of workers, we use concurrent.futures `map()` to schedule each task from our `task_list` (in this case, an input value from 1 to 10) to run on one of the workers. As for all map() implementations, we are asking for each value in `task_list` to be executed in the `task` function we defined above, but in this case it will be executed using one of the workers from the `executor` that we created.
232248

233249
```{python}
234250
#| eval: false
235251
from concurrent.futures import ThreadPoolExecutor
236-
252+
np.arange(8)
237253
@timethis
238254
def run_threaded(task_list):
239255
with ThreadPoolExecutor(max_workers=20) as executor:
@@ -450,8 +466,14 @@ Again, the output messages print almost immediately, but then later the processe
450466

451467
## Parallel processing with `parsl`
452468

469+
:::{layout-ncol="2"}
470+
453471
`concurrent.futures` is great and powerful, but it has its limits. Particularly as you try to scale up into the thousands of concurrent tasks, other libraries like [Parsl](https://parsl-project.org/) ([docs](https://parsl.readthedocs.io/)), [Dask](https://www.dask.org/), [Ray](https://www.ray.io/), and others come into play. They all have their strengths, but Parsl makes it particularly easy to build parallel workflows out of existing python code through it's use of decorators on existing python functions.
454472

473+
![](../images/parsl-logo.png)
474+
475+
:::
476+
455477
In addition, Parsl supports a lot of different kinds of [providers](https://parsl.readthedocs.io/en/stable/userguide/execution.html#execution-providers), allowing the same python code to be easily run multi-threaded using a `ThreadPoolExecutor` and via multi-processing on many different cluster computing platforms using the `HighThroughputExecutor`. For example, Parsl includes providers supporting local execution, and on Slurm, Condor, Kubernetes, AWS, and other platforms. And Parsl handles data staging as well across these varied environments, making sure the data is in the right place when it's needed for computations.
456478

457479
Similarly to before, we start by configuring an executor in parsl, and loading it. We'll use multiprocessing by configuring the `HighThroughputExecutor` to use our local resources as a cluster, and we'll activate our virtual environment to be sure we're executing in a consistent environment.

0 commit comments

Comments
 (0)