A simple dataflow engine with scalable semantics.
The goal of pydra is to provide a lightweight Python dataflow engine for DAG construction, manipulation, and distributed execution.
Feature list:
- Python 3.7+ using type annotation and attrs
- Composable dataflows with simple node semantics. A dataflow can be a node of another dataflow.
splitter
andcombiner
provides many ways of compressing complex loop semantics- Cached execution with support for a global cache across dataflows and users
- Distributed execution, presently via ConcurrentFutures, SLURM, and Dask (this is an experimental implementation with limited testing)
[API Documentation] [PyCon 2020 Poster]
The Pydra Tutorial can be found in the pydra-tutorial repository.
The tutorial can be run locally (with the necessary requirements) or using Binder service:
Please note that mybinder times out after an hour.
pip install pydra
Pydra requires Python 3.7+. To install in developer mode:
git clone git@github.com:nipype/pydra.git
cd pydra
pip install -e .[dev]
If you want to test execution with Dask:
git clone git@github.com:nipype/pydra.git
cd pydra
pip install -e .[dask]
It is also useful to install pre-commit:
pip install pre-commit
pre-commit