Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPU support for extruded 1D spaces #2172

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dennisYatunin
Copy link
Member

@dennisYatunin dennisYatunin commented Feb 3, 2025

This PR adds GPU support for extruded 1D spaces, with the following modifications:

  • Remove the assumption that Ni == Nj from mapreduce_cuda_kernel, columnwise_partition, columnwise_universal_index, and multiple_field_solve_universal_index
  • Add methods for operator_shmem operator_fill_shmem!, and operator_evaluate corresponding to the 1D versions of all SEM operators
  • Rename DSSDataTypes to DSSTypes2D, and define a new constant DSSTypes1D
  • Add a method for dss_1d! corresponding to CUDADevice, and extend dss! to support both 1D and 2D data types
  • Adapt the local_geometry and dss_weights of a SpectralElementSpace1D from Arrays to CuArrays

These changes are also tested through CI:

  • Add a CUDA job that runs unit_2d.jl and convergence_2d.jl
  • Add tests for IFH, IHF, VIFH, and VIHF to unit_mapreduce.jl, which already has a CUDA job

To fix several test failues, I had to make a few additional changes:

  • Simplify the CUDA implementation of Gradient by directly differentiating vectors instead of breaking them up into components, as the original design could not distinguish the gradient of a scalar from the gradient of a single-component vector, and one of the tests in unit_2d.jl takes the gradient of a UVector field
  • Extend strip_space to FiniteDifferenceOperator boundary conditions, as one of the tests in unit_2d.jl uses a 1D field for the CurlC2F boundary condition
  • Reduce the number of methods for level(space::ExtrudedFiniteDifferenceSpace, v) from 4 to 1 to avoid a GPU inference failure in reconstruct_placeholder_space when evaluating boundary conditions (we hit some sort of compiler heuristic when we use 2 or 4 specialized methods, so we need to just have one method with union-splitting)
  • Adapt several CuArrays to Arrays in unit_2d.jl and unit_spaces.jl to avoid scalar indexing errors
  • Bump the number of allowed JET failures for SpectralElementSpace1D in opt_spaces.jl from 141 to 825 on GPUs and from 137 to 150 on CPUs

  • Code follows the style guidelines OR N/A.
  • Unit tests are included OR N/A.
  • Code is exercised in an integration test OR N/A.
  • Documentation has been added/updated OR N/A.

I[i, t] *
I[i, s] *
(
I[i, t] * (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a bug? Why aren't our unit tests catching this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like there are any unit tests for remapping 1D extruded spaces. I discovered this bug through CliMA/ClimaAtmos.jl#3593

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good to know. Can we add a unit test in ClimaCore to exercise this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's easy to add. I only discovered this bug last night so I'll add some tests today.

Copy link
Member

@Sbozzolo Sbozzolo Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are unit tests for 1D extruded spaces, but they were skipped on GPU because some things were broken on GPUs for those spaces. Here they are:

@testset "2D extruded" begin
vertdomain = Domains.IntervalDomain(
Geometry.ZPoint(0.0),
Geometry.ZPoint(1000.0);
boundary_names = (:bottom, :top),
)
vertmesh = Meshes.IntervalMesh(vertdomain, nelems = 30)
verttopo = Topologies.IntervalTopology(
ClimaComms.SingletonCommsContext(device),
vertmesh,
)
vert_center_space = Spaces.CenterFiniteDifferenceSpace(verttopo)
horzdomain = Domains.IntervalDomain(
Geometry.XPoint(-500.0) .. Geometry.XPoint(500.0),
periodic = true,
)
quad = Quadratures.GLL{4}()
horzmesh = Meshes.IntervalMesh(horzdomain, nelems = 10)
horztopology = Topologies.IntervalTopology(
ClimaComms.SingletonCommsContext(device),
horzmesh,
)
horzspace = Spaces.SpectralElementSpace1D(horztopology, quad)
hv_center_space =
Spaces.ExtrudedFiniteDifferenceSpace(horzspace, vert_center_space)
coords = Fields.coordinate_field(hv_center_space)
xpts = range(-500.0, 500.0, length = 21)
zpts = range(0.0, 1000.0, length = 21)
hcoords = [Geometry.XPoint(x) for x in xpts]
zcoords = [Geometry.ZPoint(z) for z in zpts]
remapper = Remapping.Remapper(
hv_center_space,
hcoords,
zcoords,
buffer_length = 2,
)
interp_x = Remapping.interpolate(remapper, coords.x)
interp_x2 = Remapping.interpolate(coords.x, hcoords, zcoords)
if ClimaComms.iamroot(context)
@test Array(interp_x) [x for x in xpts, z in zpts]
@test Array(interp_x2) [x for x in xpts, z in zpts]
end
interp_z = Remapping.interpolate(remapper, coords.z)
expected_z = [z for x in xpts, z in zpts]
if ClimaComms.iamroot(context)
@test Array(interp_z[:, 2:(end - 1)]) expected_z[:, 2:(end - 1)]
@test Array(interp_z[:, 1])
[1000.0 * (0 / 30 + 1 / 30) / 2 for x in xpts]
@test Array(interp_z[:, end])
[1000.0 * (29 / 30 + 30 / 30) / 2 for x in xpts]
end
# Remapping two fields
interp_xx = Remapping.interpolate(remapper, [coords.x, coords.x])
if ClimaComms.iamroot(context)
@test interp_x interp_xx[:, :, 1]
@test interp_x interp_xx[:, :, 2]
end
# Remapping three fields (more than the buffer length)
interp_xxx =
Remapping.interpolate(remapper, [coords.x, coords.x, coords.x])
if ClimaComms.iamroot(context)
@test interp_x interp_xxx[:, :, 1]
@test interp_x interp_xxx[:, :, 2]
@test interp_x interp_xxx[:, :, 3]
end
# Remapping in-place one field
dest = ArrayType(zeros(21, 21))
Remapping.interpolate!(dest, remapper, coords.x)
if ClimaComms.iamroot(context)
@test interp_x dest
end
# Two fields
dest = ArrayType(zeros(21, 21, 2))
Remapping.interpolate!(dest, remapper, [coords.x, coords.x])
if ClimaComms.iamroot(context)
@test interp_x dest[:, :, 1]
@test interp_x dest[:, :, 2]
end
# Three fields (more than buffer length)
dest = ArrayType(zeros(21, 21, 3))
Remapping.interpolate!(dest, remapper, [coords.x, coords.x, coords.x])
if ClimaComms.iamroot(context)
@test interp_x dest[:, :, 1]
@test interp_x dest[:, :, 2]
@test interp_x dest[:, :, 3]
end
end
end
.

@@ -275,9 +273,9 @@ function set_interpolated_values_kernel!(

h = local_horiz_indices[i]
out[i, k] = 0
for t in 1:Nq, s in 1:Nq
for t in 1:Nq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, is this a bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is also an untested bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, cc @Sbozzolo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should been tested here:

@testset "Purely vertical space" begin
vertdomain = Domains.IntervalDomain(
Geometry.ZPoint(0.0),
Geometry.ZPoint(1000.0);
boundary_names = (:bottom, :top),
)
vertmesh = Meshes.IntervalMesh(vertdomain, nelems = 30)
verttopo = Topologies.IntervalTopology(
ClimaComms.SingletonCommsContext(ClimaComms.device()),
vertmesh,
)
cspace = Spaces.CenterFiniteDifferenceSpace(verttopo)
fspace = Spaces.FaceFiniteDifferenceSpace(cspace)
zpts = range(0.0, 1000.0, 21)
zcoords = [Geometry.ZPoint(z) for z in zpts]
# Center space
cremapper = Remapping.Remapper(
cspace;
target_zcoords = zcoords,
buffer_length = 2,
)
ccoords = Fields.coordinate_field(cspace)
cinterp_z = Remapping.interpolate(cremapper, ccoords.z)
cexpected_z = ArrayType(zpts)
ClimaComms.allowscalar(device) do
cexpected_z[1] = 1000.0 * (0 / 30 + 1 / 30) / 2
cexpected_z[end] = 1000.0 * (29 / 30 + 30 / 30) / 2
end
@test cinterp_z cexpected_z
# Face space
fremapper = Remapping.Remapper(
fspace;
target_zcoords = zcoords,
buffer_length = 2,
)
fcoords = Fields.coordinate_field(fspace)
finterp_z = Remapping.interpolate(fremapper, fcoords.z)
fexpected_z = ArrayType(zpts)
finterp_z fexpected_z
# Remapping two fields
cinterp = Remapping.interpolate(cremapper, [ccoords.z, ccoords.z])
@test cexpected_z cinterp[:, 1]
@test cexpected_z cinterp[:, 2]
finterp = Remapping.interpolate(fremapper, [fcoords.z, fcoords.z])
@test fexpected_z finterp[:, 1]
@test fexpected_z finterp[:, 2]
# Remapping three fields (more than the buffer length)
cinterp =
Remapping.interpolate(cremapper, [ccoords.z, ccoords.z, ccoords.z])
@test cexpected_z cinterp[:, 1]
@test cexpected_z cinterp[:, 2]
@test cexpected_z cinterp[:, 3]
finterp =
Remapping.interpolate(fremapper, [fcoords.z, fcoords.z, fcoords.z])
@test fexpected_z finterp[:, 1]
@test fexpected_z finterp[:, 2]
@test fexpected_z finterp[:, 3]
# Remapping in-place one field
dest = ArrayType(zeros(21))
Remapping.interpolate!(dest, cremapper, ccoords.z)
@test cexpected_z dest
Remapping.interpolate!(dest, fremapper, fcoords.z)
@test fexpected_z dest
# Three fields (more than buffer length)
dest = ArrayType(zeros(21, 3))
Remapping.interpolate!(
dest,
cremapper,
[ccoords.z, ccoords.z, ccoords.z],
)
@test cexpected_z dest[:, 1]
@test cexpected_z dest[:, 2]
@test cexpected_z dest[:, 3]
Remapping.interpolate!(
dest,
fremapper,
[fcoords.z, fcoords.z, fcoords.z],
)
@test fexpected_z dest[:, 1]
@test fexpected_z dest[:, 2]
@test fexpected_z dest[:, 3]
end

Copy link
Member

@charleskawczynski charleskawczynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump the number of allowed JET failures for SpectralElementSpace1D in opt_spaces.jl from 141 to 825 on GPUs and from 137 to 150 on CPUs

This is alarming, without manifest changes. Would you mind taking a look at what is causing the spike in inferences? (I think that script makes it pretty easy to modify & see)

@dennisYatunin
Copy link
Member Author

This is alarming, without manifest changes. Would you mind taking a look at what is causing the spike in inferences? (I think that script makes it pretty easy to modify & see)

The spike in inferences for SpectralElementSpace1D on GPUs is caused by converting the Arrays in the local geometry and DSS weights to CuArrays. As we would expect, the number of JET failures on GPUs is now more similar to SpectralElementSpace2D, which does the same conversion.

surface::F
end

strip_field(::LinearAdaption) = LinearAdaption(nothing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be strip_space? And it should keep the field values

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to keep the field values in this function because they are passed in separately through ref_z_to_physical_z. I think it is best to avoid passing duplicate field values through here. In general, the design of this section should probably be refactored to split the surface elevations from the adaption specification.

function dss!(data::DSSDataTypes, topology::Topology2D)
function dss!(data::DSSTypes1D, topology::IntervalTopology)
Nij = DataLayouts.get_Nij(data)
length(parent(data)) == 0 && return nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be more cases here that we can short circuit. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like whether we are using GL or GLL quadrature? I think we're just assuming GLL quadrature everywhere outside of a few unit tests. Or did you have something else in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking Nh > 1 should be good (+GLL?)

end
using Test
@testset "Example broadcast expression on GPUs" begin
@test !isnothing(@. ᶜgradᵥ(level_field + ᶠscalar_field))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉, thank you!

@charleskawczynski
Copy link
Member

This is alarming, without manifest changes. Would you mind taking a look at what is causing the spike in inferences? (I think that script makes it pretty easy to modify & see)

It turns out that these are all false warnings, induced by our get! caching. Commenting out get! in spectral element grids and interval meshes shows no inference failures.

Copy link
Member

@charleskawczynski charleskawczynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chatted with @dennisYatunin about this PR. These changes look fine, and I approve once we add some unit tests exercising the new pieces/features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gradient on combination of horizontal and 3d fields does not work
3 participants