Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some test failures #406

Merged
merged 9 commits into from
Jun 14, 2024
Merged

Fix some test failures #406

merged 9 commits into from
Jun 14, 2024

Conversation

rkierulf
Copy link
Collaborator

There are a few issues with #405 I noticed while running tests today. I forgot about the :skipci tag test filter, so I was only running the CPU tests while testing KomaMRICore. It turns out the GPU SimpleMotion and ArbitraryMotion tests were failing as a result of doing away with the custom adapt_storage definitions. However, after adding back the equivalent adapt_storage functions in KomaMetalExt and KomaCUDAExt, these tests pass once again. The other issue was in line 338 in SimulatorCore.jl:

sim_params[precision]

should be:

sim_params["precision"]

Also fixed in this pull request. So with these changes, all 85 of the KomaMRICore tests pass when using CUDA and Metal (still not sure about AMD / oneAPI until #147 )

Copy link

codecov bot commented Jun 13, 2024

Codecov Report

Attention: Patch coverage is 0% with 11 lines in your changes missing coverage. Please review.

Project coverage is 89.00%. Comparing base (931fec8) to head (6d10889).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #406      +/-   ##
==========================================
- Coverage   89.32%   89.00%   -0.33%     
==========================================
  Files          44       44              
  Lines        2736     2746      +10     
==========================================
  Hits         2444     2444              
- Misses        292      302      +10     
Flag Coverage Δ
base 86.42% <ø> (ø)
core 83.46% <0.00%> (-2.33%) ⬇️
files 93.70% <ø> (ø)
komamri 93.98% <ø> (ø)
plots 89.64% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
KomaMRICore/src/simulation/SimulatorCore.jl 90.54% <0.00%> (ø)
KomaMRICore/src/simulation/Functors.jl 53.12% <0.00%> (-24.15%) ⬇️

Copy link
Member

@cncastillo cncastillo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing thanks!!, I didn't notice that the tests with the tag :gpu also had the :skipci. The :skipci is considered in the CI and when doing ] test KomaMRICore, but the VSCode test suit runs them all (that is why I didn't realize). We could avoid this problem by doing

is_running_on_CI = get(ENV, "CI", nothing) == "true" # https://docs.github.com/en/actions/learn-github-actions/variables#default-environment-variables
@run_package_tests filter=ti -> (:core in ti.tags) && (is_running_on_CI ? !(:skipci in ti.tags) : true)

On the other hand, these custom adapt functions are generating issues, and it is slightly annoying that we need to specify them for each GPU backend. Moreover, they force the floating-point precision to f32 in some places (what if we want to use 16 or 64?). CC'ing @pvillacorta that has been working on that. If there is not an issue about this, we could create one.

@rkierulf
Copy link
Collaborator Author

rkierulf commented Jun 13, 2024

It turns out my first commit had syntax issues in each of the /ext modules, which caused them not to be loaded ... and then all the GPU tests defaulted back to the CPU which is why they passed (and I did not notice at the time since we use the suppress macro when running tests for the simulate function).

With the last commit, the syntax issues are gone but the tests are back to failing for SimpleMotion and ArbitraryMotion 😭 (at least with Metal, I haven't checked CUDA yet)

@cncastillo
Copy link
Member

Can you share the error? Maybe @pvillacorta can help

@rkierulf
Copy link
Collaborator Author

I think I fixed the most recent error for ArbitraryMotion - the problem was that I wasn't importing Adapt in the /ext modules, so the adapt_storage functions defined there weren't actually getting called. I'm running the ArbitraryMotion test now and although it's taking a while, it is 73% of the way through and hasn't errored out yet 🤞

@pvillacorta
Copy link
Collaborator

Hey, I just uploaded #408, where there is no need of a custom adapt function for ArbitraryMotion. Maybe, once this #406 PR is merged into master, I can merge from master into #408 and see if everything works. It should be easy. Thank you :)

@rkierulf
Copy link
Collaborator Author

rkierulf commented Jun 14, 2024

Thanks @pvillacorta , it looks like #408 should simplify the GPU transfer for these MotionModel structs! I was actually able to get the SimpleMotion and ArbitraryMotion GPU tests to pass for CUDA with my latest commit. The main issue was that the tests pass sim_params["precision"] = "f64". Previously, we were using CUDA.cu() to convert structs to the device which always converts to f32 so the Phantom struct ended up being f32 entirely. Now that we are using the adapt_storage functions to the CUDABackend() type which CUDA.jl defines, this is no longer the case, and there end up being type issues since some of the Motion structs were hardcoded to convert to f32. Since these are also part of the Phantom object, the type parameter {T} cannot be different for the Motion field and the other fields. It did work, however, when I got rid of the f32 for NoMotion and SimpleMotion and changed the GPU transfer to just return identical copies of those structs, and left the ArbitraryMotion GPU transfer the same while also getting rid of f32. The only custom adapt logic I still had to define was for the Vector{LinearInterpolator} type to element wise convert to CuArray, MtlArray, etc., but looking at your pull request I don't think this will be necessary for much longer.

Can you take a look at these latest changes and let me know if everything looks good? I think if possible, it would be preferable to make the MotionModel structs precision-agnostic so they will work with f64 as well and once you are ready to merge your pull request you can simplify a bit further.

The last thing is that for Metal, the SimpleMotion test is still failing. There is no error, but the numbers are off in the final norm comparison. I haven't looked into it deeply yet, but this will also need to be fixed before this PR can be merged.

@rkierulf
Copy link
Collaborator Author

rkierulf commented Jun 14, 2024

After talking with Pablo and Carlos, I think we are good to merge for now. There is still the 1 failing SimpleMotion test for Metal, but this appears to due to phase rotation, since:

KomaMRICore.KomaMRIPlots.plot(abs.(signal))

matches the ground truth, and:

KomaMRICore.KomaMRIPlots.plot(imag.(signal))

does not match.

So perhaps something in the SimpleMotion implementation is interacting weirdly with Metal. Although for now, we can keep the :skipci tags for the SimpleMotion and ArbitraryMotion tests since these features are still experimental, and there's a chance this issue will be fixed soon with #408 .

@rkierulf rkierulf merged commit a7e08c2 into master Jun 14, 2024
17 of 18 checks passed
@rkierulf rkierulf deleted the buildkite branch June 14, 2024 20:14
@rkierulf rkierulf restored the buildkite branch June 14, 2024 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants