This Prefect collection contains a multiprocess task runner. It is ideal for running CPU-intensive Prefect tasks in parallel. It is useful in scenarios where you want to spread computation across multiple CPU cores on a single machine without adding heavy dependencies like Dask. This package does not require any extra dependencies beyond what Prefect already installs.
Install the package by running:
pip install prefect-multiprocess
Then, use the task runner in your Prefect flows. The task runner only accepts one parameter: processes
, which controls the number of worker processes to start for running tasks. If not provided, it defaults to the number of CPUs on the host machine.
Examples:
Create one process for every CPU/CPU core
from prefect import flow
from prefect_multiprocess.task_runners import MultiprocessTaskRunner
@flow(task_runner=MultiprocessTaskRunner())
def my_flow():
...
Customizing the number of processes:
@flow(task_runner=MultiprocessTaskRunner(processes=4))
def my_flow():
...
Requires an installation of Python 3.8+.
We recommend using a Python virtual environment manager such as pipenv, conda or virtualenv.
These tasks are designed to work with Prefect 2.0. For more information about how to use Prefect, please refer to the Prefect documentation.
MultiprocessTaskRunner
uses cloudpickle
to serialize tasks and return values, so task parameters and returns need to be values that cloudpickle
can handle.
If you encounter any bugs while using prefect-multiprocess
, feel free to open an issue in the prefect-multiprocess repository.
Feel free to ⭐️ or watch prefect-multiprocess
for updates too!
If you'd like to install a version of prefect-multiprocess
for development, clone the repository and perform an editable install with pip
:
git clone https://github.com/rpeden/prefect-multiprocess.git
cd prefect-multiprocess/
pip install -e ".[dev]"
# Install linting pre-commit hooks
pre-commit install