Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Distributed Examples to Documentation #468

Merged
merged 19 commits into from
Aug 23, 2024
Merged

Add Distributed Examples to Documentation #468

merged 19 commits into from
Aug 23, 2024

Conversation

rkierulf
Copy link
Collaborator

I added two examples I tested yesterday to the "how-to" section of the documentation, one for running a simulation using multiple GPUs, and another using multiple nodes on a SLURM cluster. This isn't included in the example scripts, but I verified the accuracy compared to a normal simulation with these lines of code added at the end:

signal_2 = simulate(obj, seq, sys)
println([sum(abs.(signal.profiles[i].data .- signal_2.profiles[i].data)) / length(signal.profiles[i].data) for i=1:length(signal.profiles)])

The values I saw were all on the scale of 10e-6 / 10e-7, which is near machine precision for Float32s.

Add distributed examples in "how-to" section
@rkierulf rkierulf added the documentation Improvements to docs., it also triggers doc preview label Aug 22, 2024
@cncastillo
Copy link
Member

@cncastillo
Copy link
Member

cncastillo commented Aug 22, 2024

btw @rkierulf @pvillacorta I think this could be very useful to simulate the "aorta" with flowing spins, as you can distribute the phantom to multiple nodes or GPUs!!

It would be cool if @pvillacorta could test this and put some photos :D As an example a phantom that was too big to fit in one GPU

@pvillacorta
Copy link
Collaborator

pvillacorta commented Aug 22, 2024

Nice! Once #442 is finished, I will try to simulate with the aorta and I will make an example for this :)

Copy link

codecov bot commented Aug 22, 2024

Codecov Report

Attention: Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.

Project coverage is 90.93%. Comparing base (57f8698) to head (d9f10b9).
Report is 1 commits behind head on master.

Files Patch % Lines
KomaMRICore/src/simulation/GPUFunctions.jl 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #468      +/-   ##
==========================================
+ Coverage   90.75%   90.93%   +0.17%     
==========================================
  Files          53       53              
  Lines        2932     2934       +2     
==========================================
+ Hits         2661     2668       +7     
+ Misses        271      266       -5     
Flag Coverage Δ
base 88.20% <ø> (ø)
core 92.61% <87.50%> (+0.95%) ⬆️
files 93.55% <ø> (ø)
komamri 93.98% <ø> (ø)
plots 89.30% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
KomaMRICore/src/rawdata/ISMRMRD.jl 98.50% <100.00%> (+1.78%) ⬆️
KomaMRICore/src/simulation/GPUFunctions.jl 60.52% <0.00%> (-1.64%) ⬇️

... and 1 file with indirect coverage changes

@cncastillo
Copy link
Member

cncastillo commented Aug 22, 2024

What about adding some images? (inspired by https://classic.d2l.ai/chapter_computational-performance/parameterserver.html)

KomamultiGPU

KomamultiNode

Copy link
Member

@cncastillo cncastillo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice! quite amazing that it works. Could you:

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KomaMRI Benchmarks

Benchmark suite Current: c309ab6 Previous: 57f8698 Ratio
MRI Lab/Bloch/CPU/2 thread(s) 227228176 ns 226618506 ns 1.00
MRI Lab/Bloch/CPU/4 thread(s) 174890811.5 ns 174536994 ns 1.00
MRI Lab/Bloch/CPU/8 thread(s) 92125490.5 ns 146360095.5 ns 0.63
MRI Lab/Bloch/CPU/1 thread(s) 340803231.5 ns 347644824 ns 0.98
MRI Lab/Bloch/GPU/CUDA 57453774.5 ns 57253633 ns 1.00
MRI Lab/Bloch/GPU/oneAPI 513802813 ns 515042255.5 ns 1.00
MRI Lab/Bloch/GPU/Metal 587853374.5 ns 541353541 ns 1.09
MRI Lab/Bloch/GPU/AMDGPU 37310330 ns 37619574.5 ns 0.99
Slice Selection 3D/Bloch/CPU/2 thread(s) 1175371127 ns 1024148878 ns 1.15
Slice Selection 3D/Bloch/CPU/4 thread(s) 588597986 ns 580936747 ns 1.01
Slice Selection 3D/Bloch/CPU/8 thread(s) 349205204.5 ns 386777586 ns 0.90
Slice Selection 3D/Bloch/CPU/1 thread(s) 1927153559.5 ns 1925568005.5 ns 1.00
Slice Selection 3D/Bloch/GPU/CUDA 101738262 ns 100754922 ns 1.01
Slice Selection 3D/Bloch/GPU/oneAPI 628750808 ns 654922437.5 ns 0.96
Slice Selection 3D/Bloch/GPU/Metal 563667813 ns 564653500 ns 1.00
Slice Selection 3D/Bloch/GPU/AMDGPU 60715236 ns 60779232 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@cncastillo
Copy link
Member

This is what the HPC example is actually doing

KomamultiNodeCPU

@rkierulf rkierulf merged commit a26d598 into master Aug 23, 2024
20 checks passed
@rkierulf rkierulf deleted the distributed branch August 23, 2024 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements to docs., it also triggers doc preview
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add example and / or support for multi-node simulation Add example of Multi-GPU simulation
3 participants