PyGPI is a communication library designed to provide easy-to-use, intuitive access to GPI-based communication. To this end, PyGPI makes GaspiCxx primitives available in Python, focusing on the following features:
- Global access functions such as
get_rank
,get_size
-
Group
functionality for defining processes involved in a particular communication - Single-sided point-to-point communication using
Source/TargetBuffer
s - Non-blocking collective communication
- Allreduce
- Broadcast
- Allgatherv
- Allgather
- Gather(v)
- Blocking collective communication
- Barrier
- PyGPI is designed for communication of contiguous, homogeneous arrays
- Supported types:
np.array
, Pythonlist
, Python scalars
- Supported types:
- Default datatype is
np.float32
- The user can set the datatype when creating a communication object (e.g., Broadcast) by
specifying a
dtype
argument - Supported dtypes might differ depending on the collective used
- Typical dtypes:
int
,float
,np.int32
,np.int64
,np.double
- The user can set the datatype when creating a communication object (e.g., Broadcast) by
specifying a
- Each collective is initialized with a default algorithm
- Users can specify other algorithms with an
algorithm
argument to the constructor
- Users can specify other algorithms with an
- Returned result is always a contiguous
np.array
- The number of dimensions of the input arrays is not preserved in the output; reshaping is needed
list
elements are assumed to be homogeneous (same type as specified in thedtype
argument of the communication primitive)strings
are not supported
Each collective is an object built for a specific communication pattern defined by a
number of elements of a given dtype, which are arranged in a contiguous memory
buffer (i.e. a list
or Numpy array
).
These parameters describe the only configuration allowed for input arrays (using other types
of arrays results in unspecified behaviour).
Each (non-blocking) collective provides two methods:
start
(with or withoutinput
parameters depending on collective): local, non-blocking method signaling the start of communicationwait_for_completion
: blocking function that returns the result of the operation (orNone
for the ranks that do not require a result).
Additionally, each collective implements the following properties:
obj.collective
- the collective type, e.g. "Allreduce"obj.algorithm
- which algorithm is in useobj.dtype
- data type required for each of the buffer elements
The collective object can be reused for multiple communications on different input buffers.
Example for Allreduce
:
import pygpi
group_all = pygpi.Group()
dtype = "long"
number_elements = 30
input_array = np.ones(number_elements, dtype)
allreduce = pygpi.Allreduce(group_all, number_elements, pygpi.ReductionOp.SUM,
dtype = dtype)
allreduce.start(input_array)
output_array = allreduce.wait_for_completion()
print(f"[rank {pygpi.get_rank()}] result = {output_array}")
Make sure a python
interpretor is available on your machine, as well as the pybind11
library.
This can be achieved for instance by creating a conda
environment:
conda create -n pygpi
conda activate pygpi
conda install python=3.8
conda install pybind11 -c conda-forge
Note that Pybind11 is available through pip
and conda
.
However, the pip
-package does not seem to include a cmake package. This is why we recommend
installing Pybind11 via conda
.
Follow the installation steps for the dependencies of GaspiCxx in the README file. Then compile and install GaspiCxx from the git repository as a shared library.
git clone https://github.com/cc-hpc-itwm/GaspiCxx.git
cd GaspiCxx
mkdir build && cd build
export GASPICXX_INSTALLATION_PATH=/your/gaspicxx/installation/path
cmake -DBUILD_PYTHON_BINDING=ON \
-DBUILD_SHARED_LIBS=ON \
-DCMAKE_PREFIX_PATH=${GTEST_INSTALLATION_PATH} \
-DCMAKE_INSTALL_PREFIX=${GASPICXX_INSTALLATION_PATH} ../
make -j$nprocs install
The PyGPI
tests rely on the pytest
package, which can be installed in the same conda
environment using pip
:
conda activate pygpi
pip install -U pytest
Then compile GaspiCxx with testing enabled:
export GASPICXX_INSTALLATION_PATH=/your/gaspicxx/installation/path
cmake -DENABLE_TESTS=ON \
-DBUILD_PYTHON_BINDINGS=ON \
-DBUILD_SHARED_LIBS=ON \
-DCMAKE_PREFIX_PATH=${GTEST_INSTALLATION_PATH} \
-DCMAKE_INSTALL_PREFIX=${GASPICXX_INSTALLATION_PATH} ../
make -j$nprocs install
- Create a
nodesfile
with one machine name (i.e., hostname) per line for each of the parallel processes that will be involved in the data parallel DNN training.
$cat ./nodesfile
localhost
localhost
- Create a
script.sh
file to execute a simple DNN training run:
cat script.sh
export PYTHONPATH=${GASPICXX_INSTALLATION_PATH}/lib/:${GASPICXX_INSTALLATION_PATH}/lib
export LD_LIBRARY_PATH=${GASPICXX_INSTALLATION_PATH}/lib:${LD_LIBRARY_PATH}
python ${SRC_PATH}/simple_collective_example.py
- Execute the script on all processes in parallel using:
gaspi_run -m ./nodesfile -n ${NUM_PROCESSES} ./script.sh
The ${NUM_PROCESSES}
variable should match the number of lines in the nodesfile
.
The same hostname
can be repeated multiple times in the nodesfile
if multiple ranks per host are needed.
- add helper functions to list available algorithms/dtypes for each collective