This is a quick-start guide to mpi4py
to run simple scripts, e.g. for making plots in parallel. Setup instructions along with some example scripts are provided in this repo. No sudo
required.
To simplifying installation and to setup our workspace, let's install Anaconda. You can download it straight from the official website of Anaconda.
For example, on Linux, you can download the version from 05.2022
by running below.
wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh
To install it, run:
bash Anaconda3-2022.05-Linux-x86_64.sh
Go through the install process, and make sure to initialize conda
. Afterwards, to activate conda
, run:
source ~/.bashrc
Lastly, create a new virtual environment to mess around:
conda create -n py310 python=3.10
The above creates an environment named py310
with a python version 3.10. The last thing you need to do is to activate it.
conda activate py310
To the right of your login name you should see the name of the environment you are in as such: (py310)login@host
Make sure you are running python 3.5+ (3.9+ recommended) for this tutorial. That said, mpi4py
should work on python 2.7 just fine.
Install OpenMPI & mpi4py via conda
from channel conda-forge
. It is a community-led conda repo that has the most up-to-date package versions. The main anaconda channel doesn't even support python 3.9 which is shipped standard with their latest release.
conda install -c conda-forge openmpi=4.1.4=ha1ae619_100
You can also build OpenMPI from source by following their instructions
⚠️ Note the specific build hash (ha1ae619_100
, from July, 2022) when installing openmpi. The "stable" 4.1.3 & 4.1.4 fail to install essential libraries, making MPI unusable.
conda install -c conda-forge mpi4py
If you don't want to use conda
, or would like to use MPICH/Microsoft MPI, follow mpi4py's Documentation.
You might need to install numpy & matplotlib into your virtual environment:
pip isntall numpy
pip install matplotlib
During the workshop, folks on Mac had trouble installing OpenMPI from conda-forge channel. If you are exeprience a similar problem, you can default to python 3.8 and run the following:
conda create -n py38 python=3.8
conda activate py38
conda install openmpi
To run an example in parallel, first you should check how many cores (not threads) are available on your machine. You can do that via
lscpu
Look at the 12th line form the top labeled Core(s) per socket:
⚠️ Cores and Threads are different; modern CPUs typically have 2 threads per core.
To run in parallel, you need to specify how many cores to give to the process. The following command will run the script on 4 cores.
mpirun -n 4 python simple_demo.py
Below is a table describing each demo provided in this tutorial.
Demo | Description |
---|---|
simple_demo | basic test of correct installation |
plot_demo | independent for loop parallelization with plotting |
comm_demo | communication demo presenting bcast , scatter , gather , and Barrier |
If you want to parallelize a code with only some independent loops, use the comm_demo
for guidence.
For the full list of commands, refer to official mpi4py documentation
I found mpitutorial.com to be quite useful. It is not specifically on mpi4py
, but rather MPI in general. In aprticular, during the workshop I presented the diagrams from their tutorials on Barrier & Bcast and Scatter & Gather.