Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes grid equality for GPUs #2030

Merged
merged 5 commits into from
Nov 1, 2021
Merged

Fixes grid equality for GPUs #2030

merged 5 commits into from
Nov 1, 2021

Conversation

tomchor
Copy link
Collaborator

@tomchor tomchor commented Oct 28, 2021

For some reason when I tested the code after merging #2028 it didn't really work on my main code for GPUs.

It worked for a MWE when I tested it here but I guess I must have done something wrong? In any case, I apologize!

I also expanded the test to test grids on GPUs (which would have caught this error) so I think this'll help.

Given that we just released a new version, I didn't bump this to 0.63.4 here. But let me know if I should do that.

CC: @navidcy

@tomchor tomchor requested a review from navidcy October 28, 2021 03:23
@navidcy
Copy link
Collaborator

navidcy commented Oct 28, 2021

@tomchor:

  • I don't understand what "really" means in "it didn't really work". Did it work or it didn't work?

  • In case there was an issue, can you elaborate what that was and why Adapt is the solution?

  • Last, if there is still an issue and this PR fixes it then it worths a patch release. There is no quota of patch releases. We can release as many as we need to.

x1, y1, z1 = nodes((Face, Face, Face), grid1)
x2, y2, z2 = nodes((Face, Face, Face), grid2)
x1, y1, z1 = Adapt.adapt(CPU(), nodes((Face, Face, Face), grid1))
x2, y2, z2 = Adapt.adapt(CPU(), nodes((Face, Face, Face), grid2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this does anything. I did a little test and this is what I get:

julia> using Adapt, CUDA, Oceananigans

julia> a = CuArray(rand(3))
3-element CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}:
 0.7359732430925523
 0.19547988326307286
 0.6593710335404803

julia> b = Adapt.adapt(CPU(), a)
3-element CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}:
 0.7359732430925523
 0.19547988326307286
 0.6593710335404803

The reason is because the fallback for Adapt.adapt is an identity (eg adapt(to, x) = x). There's no special adapt(to, x) method for to::Oceananigans.AbstractArchitecture --- the variable to is a CUDA-specific object (this function isn't called when we run on the CPU).

What is this code trying to do --- is it converting to Array? This might be sensible, because a == b triggers scalar operations on the GPU if one of them is GPU and one is CPU. Otherwise, this should work:

julia> using CUDA

julia> a = 0.0:0.1:0.3
0.0:0.1:0.3

julia> b = Array(a)
4-element Vector{Float64}:
 0.0
 0.1
 0.2
 0.3

julia> c = CuArray(a)
4-element CuArray{Float64, 1, CUDA.Mem.DeviceBuffer}:
 0.0
 0.1
 0.2
 0.3

julia> a == b
true

julia> CUDA.@allowscalar a == c
true

julia> CUDA.@allowscalar b == c
true

julia> b == c
ERROR: Scalar indexing is disallowed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I again don't have GPU access at this exact moment to test this in more detail, but it seemed to work for me. This is what I got (from my logs):

julia> x1, y1, z1 = Adapt.adapt(CPU(), nodes((Face, Face, Face), grid1))
([0.0, 0.5], [0.0, 0.5], [1.0, 2.0, 3.0])

julia> x1, y1, z1 = nodes((Face, Face, Face), grid1)
([0.0, 0.5], [0.0, 0.5], [Error showing value of type Tuple{SubArray{Float64, 1, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, Tuple{UnitRange{Int64}}, true}, SubArray{Float64, 1, OffsetArrays.OffsetVector{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}}}, Tuple{UnitRange{Int64}}, true}, SubArray{Float64, 1, OffsetArrays.OffsetVector{Float64, CUDA.CuArray{Float64, 1}}, Tuple{UnitRange{Int64}}, true}}:
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that sent me on a journey. There is this method:

Adapt.adapt_structure(::CPU, a::OffsetGPUArray) = OffsetArray(Array(a.parent), a.offsets...)

Which I think gets called here. (I have myself to blame for that --- it seems misguided now.)

I think I prefer converting to Array here rather than using Adapt (which does basically the same thing for OffsetArray, but may not work in the future or for other array types). @allowscalar is also ok but Array seems better for some reason.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'll commit your suggestion and see if tests pass then.

@tomchor
Copy link
Collaborator Author

tomchor commented Oct 28, 2021

@tomchor:

  • I don't understand what "really" means in "it didn't really work". Did it work or it didn't work?
  • In case there was an issue, can you elaborate what that was and why Adapt is the solution?
  • Last, if there is still an issue and this PR fixes it then it worths a patch release. There is no quota of patch releases. We can release as many as we need to.

Sorry, let me be more clear:

  • It worked when I tried with a MWE that I had. I didn't try it with my main research code at the time because I only had limited GPU time and that code takes a while (I've been having trouble getting my hands on GPUs). However, I assume I must have done something wrong with my MWE because when I finally was able to test this with my main research code I wasn't able to start the simulation
  • The issue that popped up was a scalar indexing one. I think basically the z1==z2 equality uses scalar indexing, which doesn't work for GPUs. Wrapping the nodes() expression with Adapt.adapt() seemed to solve the issue on every instance that I was able to test so far. I guess CUDA.@allowscalar would also be a possibility.
  • I will bump a patch release then. I just wasn't sure if we wanted to release a another version with just one PR.

src/Grids/Grids.jl Outdated Show resolved Hide resolved
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
@tomchor tomchor requested a review from glwagner October 28, 2021 22:54
@tomchor
Copy link
Collaborator Author

tomchor commented Oct 29, 2021

@glwagner for some reason your suggestion using tuples didn't work. I was getting a "scalar indexing disallowed" error on that same line, which was weird. So I changed it to an @allowscalar statement.

It's also weird that the test I added (which supposedly tests both CPUs and GPUs) didn't catch that.... not sure why

@tomchor tomchor requested a review from francispoulin October 29, 2021 21:00
Copy link
Collaborator

@navidcy navidcy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I missed the thread a bit with the discussion about Adapt. But from what I see now Adapt isn't involved. Seeing the latest changes in the files I think they make a lot of sense so I'm happy.

@navidcy navidcy added bug 🐞 Even a perfect program still has bugs GPU 👾 Where Oceananigans gets its powers from labels Oct 31, 2021
@tomchor tomchor merged commit 8da06a2 into main Nov 1, 2021
@tomchor tomchor deleted the tc/grid_equality2 branch November 1, 2021 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Even a perfect program still has bugs GPU 👾 Where Oceananigans gets its powers from
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants