Note
The usage of dask
and distributed
and the task to implement dvc experiments made this project very convoluted.
It will no longer be maintained: checkout https://github.com/zincware/paraffin for a simpler version instead.
DVC provides tools for building and executing the computational graph
locally through various methods. The dask4dvc
package combines
Dask Distributed with DVC to make it easier to
use with HPC managers like Slurm.
The dask4dvc repro
package will run the DVC graph in parallel where possible.
Currently, dask4dvc run
will not run stages per experiment sequentially.
⚠️ This is an experimental package not affiliated in any way with iterative or DVC.
Dask4DVC provides a CLI similar to DVC.
dvc repro
becomesdask4dvc repro
.dvc queue start
becomesdask4dvc run
You can follow the progress using dask4dvc <cmd> --dashboard
.
You can use dask4dvc
easily with a slurm cluster. This requires a running dask
scheduler:
from dask_jobqueue import SLURMCluster
cluster = SLURMCluster(
cores=1, memory='128GB',
queue="gpu",
processes=1,
walltime='8:00:00',
job_cpu=1,
job_extra=['-N 1', '--cpus-per-task=1', '--tasks-per-node=64', "--gres=gpu:1"],
scheduler_options={"port": 31415}
)
cluster.adapt()
with this setup you can then run dask4dvc repro --address 127.0.0.1:31415
on
the example port 31415
.
You can also use config files with dask4dvc repro --config myconfig.yaml
. All
dask.distributed
Clusters should be supported.
default:
SGECluster:
queue: regular
cores: 10
memory: 16 GB