-
Notifications
You must be signed in to change notification settings - Fork 20
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Distributed Examples to Documentation (#468)
* Create 4-run-distributed-simulations.md Add distributed examples in "how-to" section * Update 4-run-distributed-simulations.md Correct typo * Add space * Make use case for multi-node simulation clearer * Add set_device! override without backend parameter * Add function for combining two RawAcquisitionData structs * Simplify example scripts * Remove sentence * Add isapprox function for testing * Add RawAcquisitionData test * Add images to assets folder * Update examples * Delete docs/src/assets/KomamultiGPU.svg * Delete docs/src/assets/KomamultiNode.svg * Delete docs/src/assets/KomamultiNodeCPU.svg * Add files via upload * Fix broken image * Use correct image for multiGPU * Update to use distributed macro
- Loading branch information
Showing
7 changed files
with
149 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
# Run Distributed Simulations | ||
|
||
While KomaMRI provides built-in support for CPU and GPU parallelization, it is sometimes desirable to distribute simulation work even further across multiple GPUs or compute nodes. This can be done by using Distributed.jl and making use of the independent spin property: each spin in the system is independent from the rest, so the phantom spins can be subdivided into separate simulations and results recombined, as in the diagram below: | ||
|
||
```@raw html | ||
<p align="center"><img width="90%" src="../../assets/KomamultiNode.svg"/></p> | ||
``` | ||
|
||
The following two examples demonstrate how to use Distributed.jl to run a simulation using multiple GPUS, and using multiple nodes in an HPC cluster. | ||
|
||
## Using Multiple GPUs | ||
|
||
To run a simulation using multiple GPUs, the phantom object can be divided using the kfoldperm function. Distributed.jl can then be used to start one Julia worker process per available device so that each device simulates a different part of the object. The results can then be fetched asynchronously by the main process and combined to produce a final signal. This process is shown in the diagram below: | ||
|
||
```@raw html | ||
<p align="center"><img width="90%" src="../../assets/KomamultiGPU.svg"/></p> | ||
``` | ||
|
||
The code for doing so is shown below: | ||
|
||
!!! details "SLURM Script Requesting Multiple GPUs" | ||
|
||
```sh | ||
#!/bin/bash | ||
#SBATCH --job-name # Enter job name | ||
#SBATCH -t # Enter max runtime for job | ||
#SBATCH -p # Enter partition on which to run the job | ||
#SBATCH --cpus-per-task=1 # Request 1 CPU | ||
#SBATCH --gpus= # Enter number of GPUs to request | ||
#SBATCH -o # Enter file path to write stdout to | ||
#SBATCH -e # Enter file path to write stderr to | ||
|
||
julia script.jl | ||
``` | ||
|
||
```julia | ||
using Distributed | ||
using CUDA | ||
|
||
#Add workers based on the number of available devices | ||
addprocs(length(devices())) | ||
|
||
#Define inputs on each worker process | ||
@everywhere begin | ||
using KomaMRI, CUDA | ||
sys = Scanner() | ||
seq = PulseDesigner.EPI_example() | ||
obj = brain_phantom2D() | ||
#Divide phantom | ||
parts = kfoldperm(length(obj), nworkers()) | ||
end | ||
|
||
#Distribute simulation across workers | ||
raw = Distributed.@distributed (+) for i=1:nworkers() | ||
KomaMRICore.set_device!(i-1) #Sets device for this worker, note that CUDA devices are indexed from 0 | ||
simulate(obj[parts[i]], seq, sys) | ||
end | ||
``` | ||
|
||
## Using Multiple Nodes in an HPC Cluster | ||
|
||
The script below uses the package ClusterManagers.jl to initialize worker processes on a SLURM cluster based on the number of tasks specified in the #SBATCH --ntasks directive. This can be useful to divide simulation work among multiple compute nodes if the problem is too large to fit into memory for a single computer, or if the number of desired workers is greater than the typical number of CPU cores available. An illustration of this is shown below: | ||
|
||
```@raw html | ||
<p align="center"><img width="90%" src="../../assets/KomamultiNodeCPU.svg"/></p> | ||
``` | ||
|
||
!!! details "SLURM Script Requesting Multiple Nodes" | ||
|
||
```sh | ||
#!/bin/bash | ||
#SBATCH --job-name # Enter job name here | ||
#SBATCH -t # Enter max runtime for job | ||
#SBATCH -p # Enter partition on which to run the job | ||
#SBATCH --nodes # Enter number of nodes on which to run the job | ||
#SBATCH --ntasks # Should be equal to number of nodes | ||
#SBATCH --ntasks-per-node=1 # Run each task on a separate node | ||
#SBATCH --cpus-per-task # Enter number of CPU threads to use per node | ||
#SBATCH -o # Enter file path to write stdout to | ||
#SBATCH -e # Enter file path to write stderr to | ||
|
||
julia script.jl | ||
``` | ||
|
||
```julia | ||
using Distributed | ||
using ClusterManagers | ||
|
||
#Add workers based on the specified number of SLURM tasks | ||
addprocs(SlurmManager(parse(Int, ENV["SLURM_NTASKS"]))) | ||
|
||
#Define inputs on each worker process | ||
@everywhere begin | ||
using KomaMRI | ||
sys = Scanner() | ||
seq = PulseDesigner.EPI_example() | ||
obj = brain_phantom2D() | ||
parts = kfoldperm(length(obj), nworkers()) | ||
end | ||
|
||
#Distribute simulation across workers | ||
raw = Distributed.@distributed (+) for i=1:nworkers() | ||
simulate(obj[parts[i]], seq, sys) | ||
end | ||
``` |
a26d598
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KomaMRI Benchmarks
MRI Lab/Bloch/CPU/2 thread(s)
224991845.5
ns226618506
ns0.99
MRI Lab/Bloch/CPU/4 thread(s)
174863887
ns174536994
ns1.00
MRI Lab/Bloch/CPU/8 thread(s)
90572834
ns146360095.5
ns0.62
MRI Lab/Bloch/CPU/1 thread(s)
347400444
ns347644824
ns1.00
MRI Lab/Bloch/GPU/CUDA
57092571.5
ns57253633
ns1.00
MRI Lab/Bloch/GPU/oneAPI
522866366
ns515042255.5
ns1.02
MRI Lab/Bloch/GPU/Metal
568303458
ns541353541
ns1.05
MRI Lab/Bloch/GPU/AMDGPU
36981171
ns37619574.5
ns0.98
Slice Selection 3D/Bloch/CPU/2 thread(s)
1151416902
ns1024148878
ns1.12
Slice Selection 3D/Bloch/CPU/4 thread(s)
580233533
ns580936747
ns1.00
Slice Selection 3D/Bloch/CPU/8 thread(s)
341164837
ns386777586
ns0.88
Slice Selection 3D/Bloch/CPU/1 thread(s)
1930414196.5
ns1925568005.5
ns1.00
Slice Selection 3D/Bloch/GPU/CUDA
101438582.5
ns100754922
ns1.01
Slice Selection 3D/Bloch/GPU/oneAPI
636188156.5
ns654922437.5
ns0.97
Slice Selection 3D/Bloch/GPU/Metal
565478250
ns564653500
ns1.00
Slice Selection 3D/Bloch/GPU/AMDGPU
60929383
ns60779232
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.