Skip to content

Commit

Permalink
New examples for the updated documentation (#495)
Browse files Browse the repository at this point in the history
* new examples

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Build notebooks as tests

* add executor bit

* extend notebook environment

* Update 3-hpc-allocation.ipynb

* Add key features

* update key arguments

* Work in progress for the readme

* Update readme

* add new lines

* Change Backend Names

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update __init__.py

* update readme

* Update installation

* Fix init

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update local notebook

* update local example notebook

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Explain jupyter kernel installation

* copy existing kernel

* Add HPC submission

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* execute HPC notebook once

* hpc allocation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update documentation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* replace HPC submission notebook

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
jan-janssen and pre-commit-ci[bot] authored Nov 20, 2024
1 parent d11a2d3 commit 6adbf5f
Show file tree
Hide file tree
Showing 14 changed files with 2,262 additions and 996 deletions.
11 changes: 11 additions & 0 deletions .ci_support/build_notebooks.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash
# execute notebooks
i=0;
for notebook in $(ls notebooks/*.ipynb); do
papermill ${notebook} ${notebook%.*}-out.${notebook##*.} -k python3 || i=$((i+1));
done;

# push error to next level
if [ $i -gt 0 ]; then
exit 1;
fi;
2 changes: 1 addition & 1 deletion .github/workflows/notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,4 @@ jobs:
timeout-minutes: 5
run: >
flux start
papermill notebooks/examples.ipynb examples-out.ipynb -k "python3"
.ci_support/build_notebooks.sh
197 changes: 104 additions & 93 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,111 +3,122 @@
[![Coverage Status](https://coveralls.io/repos/github/pyiron/executorlib/badge.svg?branch=main)](https://coveralls.io/github/pyiron/executorlib?branch=main)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/pyiron/executorlib/HEAD?labpath=notebooks%2Fexamples.ipynb)

## Challenges
In high performance computing (HPC) the Python programming language is commonly used as high-level language to
orchestrate the coupling of scientific applications. Still the efficient usage of highly parallel HPC clusters remains
challenging, in primarily three aspects:

* **Communication**: Distributing python function calls over hundreds of compute node and gathering the results on a
shared file system is technically possible, but highly inefficient. A socket-based communication approach is
preferable.
* **Resource Management**: Assigning Python functions to GPUs or executing Python functions on multiple CPUs using the
message passing interface (MPI) requires major modifications to the python workflow.
* **Integration**: Existing workflow libraries implement a secondary the job management on the Python level rather than
leveraging the existing infrastructure provided by the job scheduler of the HPC.

### executorlib is ...
In a given HPC allocation the `executorlib` library addresses these challenges by extending the Executor interface
of the standard Python library to support the resource assignment in the HPC context. Computing resources can either be
assigned on a per function call basis or as a block allocation on a per Executor basis. The `executorlib` library
is built on top of the [flux-framework](https://flux-framework.org) to enable fine-grained resource assignment. In
addition, [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) is supported as alternative
queuing system and for workstation installations `executorlib` can be installed without a job scheduler.

### executorlib is not ...
The executorlib library is not designed to request an allocation from the job scheduler of an HPC. Instead within a given
allocation from the job scheduler the `executorlib` library can be employed to distribute a series of python
function calls over the available computing resources to achieve maximum computing resource utilization.

## Example
The following examples illustrates how `executorlib` can be used to distribute a series of MPI parallel function calls
within a queuing system allocation. `example.py`:
Up-scale python functions for high performance computing (HPC) with executorlib.

## Key Features
* **Up-scale your Python functions beyond a single computer.** - executorlib extends the [Executor interface](https://docs.python.org/3/library/concurrent.futures.html#executor-objects)
from the Python standard library and combines it with job schedulers for high performance computing (HPC) including
the [Simple Linux Utility for Resource Management (SLURM)](https://slurm.schedmd.com) and [flux](http://flux-framework.org).
With this combination executorlib allows users to distribute their Python functions over multiple compute nodes.
* **Parallelize your Python program one function at a time** - executorlib allows users to assign dedicated computing
resources like CPU cores, threads or GPUs to one Python function call at a time. So you can accelerate your Python
code function by function.
* **Permanent caching of intermediate results to accelerate rapid prototyping** - To accelerate the development of
machine learning pipelines and simulation workflows executorlib provides optional caching of intermediate results for
iterative development in interactive environments like jupyter notebooks.

## Examples
The Python standard library provides the [Executor interface](https://docs.python.org/3/library/concurrent.futures.html#executor-objects)
with the [ProcessPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor) and the
[ThreadPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor) for parallel
execution of Python functions on a single computer. executorlib extends this functionality to distribute Python
functions over multiple computers within a high performance computing (HPC) cluster. This can be either achieved by
submitting each function as individual job to the HPC job scheduler - [HPC Submission Mode]() - or by requesting a
compute allocation of multiple nodes and then distribute the Python functions within this allocation - [HPC Allocation Mode]().
Finally, to accelerate the development process executorlib also provides a - [Local Mode]() - to use the executorlib
functionality on a single workstation for testing. Starting with the [Local Mode]() set by setting the backend parameter
to local - `backend="local"`:
```python
import flux.job
from executorlib import Executor


with Executor(backend="local") as exe:
future_lst = [exe.submit(sum, [i, i]) for i in range(1, 5)]
print([f.result() for f in future_lst])
```
In the same way executorlib can also execute Python functions which use additional computing resources, like multiple
CPU cores, CPU threads or GPUs. For example if the Python function internally uses the Message Passing Interface (MPI)
via the [mpi4py](https://mpi4py.readthedocs.io) Python libary:
```python
from executorlib import Executor


def calc(i):
from mpi4py import MPI

size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
return i, size, rank

with flux.job.FluxExecutor() as flux_exe:
with Executor(max_cores=2, executor=flux_exe, resource_dict={"cores": 2}) as exe:
fs = exe.submit(calc, 3)
print(fs.result())
```
This example can be executed using:
```
python example.py
```
Which returns:
```
>>> [(0, 2, 0), (0, 2, 1)], [(1, 2, 0), (1, 2, 1)]
```
The important part in this example is that [mpi4py](https://mpi4py.readthedocs.io) is only used in the `calc()`
function, not in the python script, consequently it is not necessary to call the script with `mpiexec` but instead
a call with the regular python interpreter is sufficient. This highlights how `executorlib` allows the users to
parallelize one function at a time and not having to convert their whole workflow to use [mpi4py](https://mpi4py.readthedocs.io).
The same code can also be executed inside a jupyter notebook directly which enables an interactive development process.

The interface of the standard [concurrent.futures.Executor](https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures)
is extended by adding the option `cores_per_worker=2` to assign multiple MPI ranks to each function call. To create two
workers the maximum number of cores can be increased to `max_cores=4`. In this case each worker receives two cores
resulting in a total of four CPU cores being utilized.

After submitting the function `calc()` with the corresponding parameter to the executor `exe.submit(calc, 0)`
a python [`concurrent.futures.Future`](https://docs.python.org/3/library/concurrent.futures.html#future-objects) is
returned. Consequently, the `executorlib.Executor` can be used as a drop-in replacement for the
[`concurrent.futures.Executor`](https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures)
which allows the user to add parallelism to their workflow one function at a time.

## Disclaimer
While we try to develop a stable and reliable software library, the development remains a opensource project under the
BSD 3-Clause License without any warranties::

with Executor(backend="local") as exe:
fs = exe.submit(calc, 3, resource_dict={"cores": 2})
print(fs.result())
```
BSD 3-Clause License
Copyright (c) 2022, Jan Janssen
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
The additional `resource_dict` parameter defines the computing resources allocated to the execution of the submitted
Python function. In addition to the compute cores `cores`, the resource dictionary can also define the threads per core
as `threads_per_core`, the GPUs per core as `gpus_per_core`, the working directory with `cwd`, the option to use the
OpenMPI oversubscribe feature with `openmpi_oversubscribe` and finally for the [Simple Linux Utility for Resource
Management (SLURM)](https://slurm.schedmd.com) queuing system the option to provide additional command line arguments
with the `slurm_cmd_args` parameter - [resource dictionary]().

This flexibility to assign computing resources on a per-function-call basis simplifies the up-scaling of Python programs.
Only the part of the Python functions which benefit from parallel execution are implemented as MPI parallel Python
funtions, while the rest of the program remains serial.

The same function can be submitted to the [SLURM](https://slurm.schedmd.com) queuing by just changing the `backend`
parameter to `slurm_submission`. The rest of the example remains the same, which highlights how executorlib accelerates
the rapid prototyping and up-scaling of HPC Python programs.
```python
from executorlib import Executor


def calc(i):
from mpi4py import MPI

size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
return i, size, rank


with Executor(backend="slurm_submission") as exe:
fs = exe.submit(calc, 3, resource_dict={"cores": 2})
print(fs.result())
```
In this case the [Python simple queuing system adapter (pysqa)](https://pysqa.readthedocs.io) is used to submit the
`calc()` function to the [SLURM](https://slurm.schedmd.com) job scheduler and request an allocation with two CPU cores
for the execution of the function - [HPC Submission Mode](). In the background the [sbatch](https://slurm.schedmd.com/sbatch.html)
command is used to request the allocation to execute the Python function.

Within a given [SLURM](https://slurm.schedmd.com) allocation executorlib can also be used to assign a subset of the
available computing resources to execute a given Python function. In terms of the [SLURM](https://slurm.schedmd.com)
commands, this functionality internally uses the [srun](https://slurm.schedmd.com/srun.html) command to receive a subset
of the resources of a given queuing system allocation.
```python
from executorlib import Executor


# Documentation
def calc(i):
from mpi4py import MPI

size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
return i, size, rank


with Executor(backend="slurm_allocation") as exe:
fs = exe.submit(calc, 3, resource_dict={"cores": 2})
print(fs.result())
```
In addition, to support for [SLURM](https://slurm.schedmd.com) executorlib also provides support for the hierarchical
[flux](http://flux-framework.org) job scheduler. The [flux](http://flux-framework.org) job scheduler is developed at
[Larwence Livermore National Laboratory](https://computing.llnl.gov/projects/flux-building-framework-resource-management)
to address the needs for the up-coming generation of Exascale computers. Still even on traditional HPC clusters the
hierarchical approach of the [flux](http://flux-framework.org) is beneficial to distribute hundreds of tasks within a
given allocation. Even when [SLURM](https://slurm.schedmd.com) is used as primary job scheduler of your HPC, it is
recommended to use [SLURM with flux]() as hierarchical job scheduler within the allocations.

## Documentation
* [Installation](https://executorlib.readthedocs.io/en/latest/installation.html)
* [Compatible Job Schedulers](https://executorlib.readthedocs.io/en/latest/installation.html#compatible-job-schedulers)
* [executorlib with Flux Framework](https://executorlib.readthedocs.io/en/latest/installation.html#executorlib-with-flux-framework)
Expand Down
5 changes: 5 additions & 0 deletions binder/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,8 @@ dependencies:
- flux-pmix =0.5.0
- versioneer =0.28
- h5py =3.12.1
- matplotlib =3.9.2
- networkx =3.4.2
- pygraphviz =1.14
- pysqa =0.2.2
- ipython =8.29.0
6 changes: 4 additions & 2 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@ format: jb-book
root: README
chapters:
- file: installation.md
- file: examples.ipynb
- file: development.md
- file: 1-local.ipynb
- file: 2-hpc-submission.ipynb
- file: 3-hpc-allocation.ipynb
- file: trouble_shooting.md
- file: 4-developer.ipynb
- file: api.rst
Loading

0 comments on commit 6adbf5f

Please sign in to comment.