Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insanely long OceanSeaIceModel compile times on GPU #135

Open
glwagner opened this issue Aug 14, 2024 · 16 comments
Open

Insanely long OceanSeaIceModel compile times on GPU #135

glwagner opened this issue Aug 14, 2024 · 16 comments
Labels
performance It's supposed to be the fastest model ever

Comments

@glwagner
Copy link
Member

I timed how long it takes to build and then take one time step with OceanSeaIceModel with this script:

using Oceananigans
using ClimaOcean
using OrthogonalSphericalShellGrids

start_time = time_ns()
arch = GPU()
grid = TripolarGrid(arch;
                    size = (50, 50, 10),
                    halo = (7, 7, 7),
                    z = (-6000, 0),
                    first_pole_longitude = 75,
                    north_poles_latitude = 55)

bottom_height = retrieve_bathymetry(grid;
                                    minimum_depth = 10,
                                    dir = "./",
                                    interpolation_passes = 20,
                                    connected_regions_allowed = 0)

grid = ImmersedBoundaryGrid(grid, GridFittedBottom(bottom_height); active_cells_map = true)

elapsed = 1e-9 * (time_ns() - start_time)
@info "Grid / bathymetry construction time: " * prettytime(elapsed)

start_time = time_ns()
free_surface = SplitExplicitFreeSurface(grid; substeps = 20)
ocean = ocean_simulation(grid; free_surface)
model = ocean.model
@info "Ocean simulation construction time: " * prettytime(elapsed)

start_time = time_ns()
backend    = JRA55NetCDFBackend(4)
atmosphere = JRA55_prescribed_atmosphere(arch; backend)
radiation  = Radiation(arch)

elapsed = 1e-9 * (time_ns() - start_time)
@info "Atmosphere construction time: " * prettytime(elapsed)

# Fluxes are computed when the model is constructed, so we just test that this works.
start_time = time_ns()
sea_ice = ClimaOcean.OceanSeaIceModels.MinimumTemperatureSeaIce()
coupled_model = OceanSeaIceModel(ocean, sea_ice; atmosphere, radiation)

elapsed = 1e-9 * (time_ns() - start_time)
@info "Coupled model construction time: " * prettytime(elapsed)

start_time = time_ns()
time_step!(coupled_model, 1)
elapsed = 1e-9 * (time_ns() - start_time)
@info "One time step time: " * prettytime(elapsed)

Running for the first time I get (ignoring the annoying warnings mentioned on #133):

[ Info: Grid / bathymetry construction time: 1.839 minutes
[ Info: Ocean simulation construction time: 1.839 minutes
[ Info: Atmosphere construction time: 11.130 seconds
[ Info: Model construction time: 5.645 minutes
[ Info: One time step time: 17.764 seconds

The 6-minute wait time for model construction isn't alleviated until the 5th or 6th time building a model.

After the time-stepping is compiled, one time-step is considerably shorter:

julia> @time time_step!(coupled_model, 1)
  0.036822 seconds (45.26 k allocations: 16.751 MiB)

It's not obvious to me why model construction is so expensive. We do call update_state! within the model constructor, which computes fluxes. But this also has to be called during time_step!, which is cheap. So there's something else going on.

Finally, time_step! seems to allocate:

julia> @time for n = 1:100; time_step!(coupled_model, 1); end
  2.330006 seconds (4.71 M allocations: 1.741 GiB, 2.54% gc time)

which is also problematic.

@glwagner glwagner added the performance It's supposed to be the fastest model ever label Aug 14, 2024
@simone-silvestri
Copy link
Collaborator

It looks like @vchuravy had a solution for it, which hopefully will come online in julia 1.11 JuliaGPU/GPUCompiler.jl#557 (comment)

However, we should really try to understand the problem with our precompilation.

@glwagner
Copy link
Member Author

Solution for which part?

@simone-silvestri
Copy link
Collaborator

for the precompilation of ClimaOcean. It looks like the time step does not precompile until the fourth execution so that might be the allocation. If you exclude the first 10 time steps does the time step continue allocating?

@glwagner
Copy link
Member Author

for the precompilation of ClimaOcean.

Interesting. I wasn't even timing that.

If you exclude the first 10 time steps does the time step continue allocating?

Yes for sure, check out the benchmark. I'm running 100 time steps.

The constructor time is dominated by constructing OceanSeaIceSurfaceFluxes. When this is omitted the construction time drops from minutes to less than a second.

@glwagner
Copy link
Member Author

glwagner commented Aug 14, 2024

Here's a little more information about constructor times for OceanSeaIceSurfaceFluxes:

  1. 0.3 s: comment out the creation of SimilarityTheoryTurbulentFluxes, total_fluxes, and surface_atmosphere_state
  2. 3.5 s comment back SimilarityTheoryTurbulentFluxes
  3. 40.7 s: the above plus comment back the creation of the interpolated atmosphere state (new to PR#126 --- creates 8 2D fields)
  4. 296.4 s: comment back the total_ocean_fluxes (which includes creating a few BinaryOperation --- creates 2 new 2D fields plus extracting the existing fields for velocity/tracer fluxes)
  5. 197.9 s: remove the BinaryOperation (supposed to be a user convenience) from total_ocean_fluxes

It doesn't take 35 s to create 8 2D fields so the cost has something to do with building the struct. I don't completely understand.

@francispoulin
Copy link

I redid the tests for fun on my laptop, nothing fast and found the following timings:

[ Info: Grid / bathymetry construction time: 4.783 minutes
[ Info: Ocean simulation construction time: 4.783 minutes
[ Info: Atmosphere construction time: 7.542 seconds
[ Info: Coupled model construction time: 38.721 seconds
[ Info: One time step time: 13.587 seconds

Clearly things are even worst for me but I also find the Grid and Ocean model are the slwo parts.

@glwagner
Copy link
Member Author

Huh, do you mean. you used a GPU or the laptop CPU?

@francispoulin
Copy link

This is my laptop GPU. Not a powerful one for sure.

@glwagner
Copy link
Member Author

glwagner commented Aug 14, 2024

It's good that the example even fits on it! How much memory does it have? Still useful for evaluating compile time and parameter space issues, perhaps.

It's interesting that on your machine the model construction is much faster than on the machine I tested on. Still confused why this is happening. I was running on julia 1.10.0, I'll test other julia versions.

@glwagner
Copy link
Member Author

Here with julia 1.10.4 with a slightly modified script taht also takes 10 time steps:

[ Info: Time for packages to load: 7.094 seconds
[ Info: Time to construct the ImmersedBoundaryGrid with realistic bathymetry: 2.040 minutes
[ Info: Time to build the ocean simulation: 17.114 minutes
[ Info: Time to build the atmosphere and radiation: 11.529 seconds
[ Info: Time to construct the OceanSeaIceModel: 4.772 minutes
 19.544141 seconds (15.39 M allocations: 1.062 GiB, 2.23% gc time, 92.16% compilation time)
154.613907 seconds (26.05 M allocations: 1.262 GiB, 0.28% gc time, 96.44% compilation time)
  0.020572 seconds (44.74 k allocations: 16.715 MiB)
  0.023772 seconds (44.74 k allocations: 16.715 MiB)
  0.023835 seconds (44.74 k allocations: 16.715 MiB)
  0.023758 seconds (44.74 k allocations: 16.715 MiB)
  0.023896 seconds (44.74 k allocations: 16.715 MiB)
  0.081326 seconds (44.74 k allocations: 16.715 MiB, 72.32% gc time)
  0.018894 seconds (44.74 k allocations: 16.715 MiB)
  0.019408 seconds (44.74 k allocations: 16.715 MiB)
[ Info: Time to take 10 time-steps: 2.908 minutes

@francispoulin
Copy link

It's good that the example even fits on it! How much memory does it have? Still useful for evaluating compile time and parameter space issues, perhaps.

It's interesting that on your machine the model construction is much faster than on the machine I tested on. Still confused why this is happening. I was running on julia 1.10.0, I'll test other julia versions.

I have gone as high as 18GB on my laptop GPU before it gave up and said no!

I should say that I was using 1.10.0.

I am happy to try another version of Julia if that's of interest.

@simone-silvestri
Copy link
Collaborator

Here with julia 1.10.4 with a slightly modified script taht also takes 10 time steps:

[ Info: Time for packages to load: 7.094 seconds
[ Info: Time to construct the ImmersedBoundaryGrid with realistic bathymetry: 2.040 minutes
[ Info: Time to build the ocean simulation: 17.114 minutes
[ Info: Time to build the atmosphere and radiation: 11.529 seconds
[ Info: Time to construct the OceanSeaIceModel: 4.772 minutes
 19.544141 seconds (15.39 M allocations: 1.062 GiB, 2.23% gc time, 92.16% compilation time)
154.613907 seconds (26.05 M allocations: 1.262 GiB, 0.28% gc time, 96.44% compilation time)
  0.020572 seconds (44.74 k allocations: 16.715 MiB)
  0.023772 seconds (44.74 k allocations: 16.715 MiB)
  0.023835 seconds (44.74 k allocations: 16.715 MiB)
  0.023758 seconds (44.74 k allocations: 16.715 MiB)
  0.023896 seconds (44.74 k allocations: 16.715 MiB)
  0.081326 seconds (44.74 k allocations: 16.715 MiB, 72.32% gc time)
  0.018894 seconds (44.74 k allocations: 16.715 MiB)
  0.019408 seconds (44.74 k allocations: 16.715 MiB)
[ Info: Time to take 10 time-steps: 2.908 minutes

So, there is a consistent 16.7 MiB allocation per time step. That is indeed a bit worrying if we have to spend 72% of the time in GC every 10ish time steps

@glwagner
Copy link
Member Author

On julia 1.11.0-rc2:

greg@tartarus:~/Projects/ClimaOcean.jl/test$ julia +1.11 --project
                  _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.0-rc2 (2024-07-29)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> include("test_ocean_sea_ice_model_parameter_space.jl")
Precompiling ClimaOcean...
  3 dependencies successfully precompiled in 20 seconds. 265 already precompiled.
[ Info: Time for packages to load: 29.139 seconds
[ Info: Regridding bathymetry from existing file ./ETOPO_2022_v1_60s_N90W180_surface.nc.
[ Info: Time to construct the ImmersedBoundaryGrid with realistic bathymetry: 1.993 minutes
[ Info: Time to build the ocean simulation: 15.823 minutes
[ Info: Time to build the atmosphere and radiation: 12.646 seconds
[ Info: Time to construct the OceanSeaIceModel: 3.819 minutes
 26.515157 seconds (25.42 M allocations: 1.297 GiB, 1.32% gc time, 93.56% compilation time)
195.530043 seconds (28.53 M allocations: 1.263 GiB, 0.41% gc time, 96.92% compilation time)
  0.027841 seconds (48.63 k allocations: 16.697 MiB)
  0.732518 seconds (48.63 k allocations: 16.697 MiB, 97.40% gc time)
  0.019094 seconds (48.63 k allocations: 16.697 MiB)
  0.018940 seconds (48.63 k allocations: 16.697 MiB)
  0.018603 seconds (48.63 k allocations: 16.697 MiB)
  0.018552 seconds (48.63 k allocations: 16.697 MiB)
  0.033975 seconds (48.63 k allocations: 16.697 MiB, 47.97% gc time)
  0.017218 seconds (48.63 k allocations: 16.697 MiB)
[ Info: Time to take 10 time-steps: 3.718 minutes

@glwagner
Copy link
Member Author

15.8 minutes to build the ocean simulations is pretty wild.

@ali-ramadhan
Copy link
Member

I also get ~15+ minute compile times when using Oceananigans.jl with OceanBioME.jl so I just use julia -O0 all the time for development now which does help a lot reduces compile times by ~10x in some cases.

@glwagner
Copy link
Member Author

Thats a nice tip @ali-ramadhan. I suspect there are also some low-hanging fruit out there to improve this situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance It's supposed to be the fastest model ever
Projects
None yet
Development

No branches or pull requests

4 participants