Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSOC: Add GPU Explanation Section to Documentation #470

Merged
merged 12 commits into from
Aug 27, 2024
31 changes: 31 additions & 0 deletions docs/src/explanation/4-gpu-explanation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# GPU Parallelization

KomaMRI uses a vendor agnostic approach to GPU parallelization in order to support multiple GPU backends. Currently, the following backends are supported:

* CUDA.jl (Nvidia)
* Metal.jl (Apple)
* AMDGPU.jl (AMD)
* oneAPI.jl (Intel)

## Choosing a GPU Backend

To determine which backend to use, KomaMRI uses [package extensions](https://pkgdocs.julialang.org/v1/creating-packages/#Conditional-loading-of-code-in-packages-(Extensions)) (introduced in Julia 1.9) to avoid having the packages for each GPU backend as explicit dependencies. This means that the user is responsible for loading the backend package (e.g. `using CUDA`) at the beginning of their code, or prior to calling KomaUI(), otherwise, Koma will default back to the CPU. Once this is done, no further action is recquired! The simulation objects will automatically be moved to the GPU and back once the simulation is finished. When the simulation is run a message will be shown with either the GPU device being used or the number of CPU threads if running on the CPU.

## How Objects are moved to the GPU

KomaMRI has a general purpose function, `gpu`, to move data from the CPU to the GPU. The `gpu` function implementation calls a separate `gpu` function with a backend parameter of type `<:KernelAbstractions.GPU` for the backend it is using. This function then calls the `fmap` function from package `Functors.jl` to recursively call `adapt` from package `Adapt.jl` on each field of the object being transferred. This is similar to how many other Julia packages, such as `Flux.jl`, transfer data to the GPU. However, an important difference is that KomaMRI adapts directly to the `KernelAbstractions.Backend` type in order to use the `adapt_storage` functions defined in each backend package, rather than defining custom adapters, resulting in an implementation with fewer lines of code.

cncastillo marked this conversation as resolved.
Show resolved Hide resolved
## Inside the Simulation

KomaMRI has three different simulation methods, all of which can run on the GPU:

* `Bloch`
* `BlochSimple`
* `BlochDict`

Of the three methods, `Bloch` is the most optimized, and has separate implementations specialized for the CPU and GPU. `BlochSimple` is equivalent to `Bloch` in the operations it performs, but less optimized and easier to understand. `BlochDict` can be understood as an extension of `BlochSimple` that outputs a more complete signal.

`BlochSimple` and `Bloch` take slightly different approaches to GPU parallelization. `BlochSimple` exclusively uses array broadcasting, with parallelization on the arrays being done implicitly by the GPU compiler. In constrast, `Bloch` uses explicit GPU kernels where advantageous, using package `KernelAbstractions.jl`. Readers curious about the performance improvements between `Bloch` and `BlochSimple` may be interested to look at the following pull reqeusts:

* [(459) Optimize run_spin_precession! for GPU](https://github.com/JuliaHealth/KomaMRI.jl/pull/459)
* [(462) Optimize run_spin_excitation! for GPU](https://github.com/JuliaHealth/KomaMRI.jl/pull/462)
cncastillo marked this conversation as resolved.
Show resolved Hide resolved
Loading