Skip to content

Conversation

@avik-pal
Copy link
Collaborator

@avik-pal avik-pal commented Feb 28, 2025

Can't seem to figure out why the padded view doesn't support mutation on the outer level.

fixes #820

@avik-pal avik-pal marked this pull request as ready for review February 28, 2025 04:19
@avik-pal
Copy link
Collaborator Author

Anyone more familiar with tracing knows what I am messing up so the mutation does not work correctly. cc @mofeing, @Pangoraw, @jumerckx, @wsmoses

@glwagner
Copy link
Collaborator

@simone-silvestri might have a comment about why we have data with odd dimension sizes even on an "even-dimensioned grid" (because of staggering) and what the implications are if any for parallelization

@jumerckx
Copy link
Collaborator

Can't seem to figure out why the padded view doesn't support mutation on the outer level.

Is there a MWE for this @avik-pal ?

Comment on lines +254 to +262
function fn_set_1(x)
x[:, 1:2] .= 1
return x
end

res1 = @jit fn_set_1(x_padded)
res2 = @jit fn_set_1(x_test)
@test Array(res1) Array(res2)
@test_broken all(Array(x_padded)[:, 1:2] .== 1)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jumerckx this broken test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, and is there a way to reproduce this on a machine where

julia> length(addressable_devices)
1

? 😅

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XLA_FLAGS="--xla_force_host_platform_device_count=8"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set the backend to cpu

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gotten to the bottom of it yet, but in codegen_unflatten! the result does not have a :resargs path like it should.

@simone-silvestri
Copy link
Contributor

simone-silvestri commented Feb 28, 2025

@simone-silvestri might have a comment about why we have data with odd dimension sizes even on an "even-dimensioned grid" (because of staggering) and what the implications are if any for parallelization

Variables that are located on Faces in Bounded directions have an extra point when compared to variables at Center (in Periodic directions, all variables have the same number of points so there is no ambiguity).

Therefore, if we want to split a Bounded direction, we need to decide which rank holds the extra point. In principle any rank can hold the extra point, but it becomes easier from a logical perspective if it is either the first one or the last one. In our distributed model we have decided that the last core is the one that holds the extra point which represents the boundary on the right side.

Therefore, if, for example, we have a grid of size (10, 10, 10), where the x direction is Bounded, and we want to split it into 5 ranks, an XFaceField(grid) will have 2 - 2 - 2 - 2 - 3 points in x in the 5 consecutive ranks, and the respective x-topologies of the grids in these 5 consecutive ranks will be:
RightConnected, FullyConnected, FullyConnected, FullyConnected, LeftConnected.

For example, running the following script with mpiexecjl -np 5 julia --project test_mpi.jl

using Oceananigans
using Oceananigans.Grids: topology

arch = Distributed(CPU())
grid = RectilinearGrid(arch, size=10, x=(0, 1), topology=(Bounded, Flat, Flat))
u = XFaceField(grid)
@info arch.local_rank topology(grid) size(u)
@info grid

outputs

(base) simonesilvestri@Simones-MacBook-Pro ~ % mpiexecjl -np 5 julia --color=yes --project test_mpi.jl
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
┌ Info: 2topology(grid) = (Oceananigans.Grids.FullyConnected, Flat, Flat)
└   size(u) = (2, 1, 1)
┌ Info: 0topology(grid) = (Oceananigans.Grids.RightConnected, Flat, Flat)
└   size(u) = (2, 1, 1)
┌ Info: 4topology(grid) = (Oceananigans.Grids.LeftConnected, Flat, Flat)
└   size(u) = (3, 1, 1)
┌ Info: 3topology(grid) = (Oceananigans.Grids.FullyConnected, Flat, Flat)
└   size(u) = (2, 1, 1)
┌ Info: 1topology(grid) = (Oceananigans.Grids.FullyConnected, Flat, Flat)
└   size(u) = (2, 1, 1)
┌ Info: 4
│   grid =2×1×1 RectilinearGrid{Float64, Oceananigans.Grids.LeftConnected, Flat, Flat} on Distributed{CPU} with 3×0×0 halo
│    ├── LeftConnected  x  [0.8, 1.0] regularly spaced with Δx=0.1
│    ├── Flat y
└    └── Flat z
┌ Info: 0
│   grid =2×1×1 RectilinearGrid{Float64, Oceananigans.Grids.RightConnected, Flat, Flat} on Distributed{CPU} with 3×0×0 halo
│    ├── RightConnected  x  [-1.58603e-17, 0.2) regularly spaced with Δx=0.1
│    ├── Flat y
└    └── Flat z
┌ Info: 3
│   grid =2×1×1 RectilinearGrid{Float64, Oceananigans.Grids.FullyConnected, Flat, Flat} on Distributed{CPU} with 3×0×0 halo
│    ├── FullyConnected x  [0.6, 0.8) regularly spaced with Δx=0.1
│    ├── Flat y
└    └── Flat z
┌ Info: 2
│   grid =2×1×1 RectilinearGrid{Float64, Oceananigans.Grids.FullyConnected, Flat, Flat} on Distributed{CPU} with 3×0×0 halo
│    ├── FullyConnected x  [0.4, 0.6) regularly spaced with Δx=0.1
│    ├── Flat y
└    └── Flat z
┌ Info: 1
│   grid =2×1×1 RectilinearGrid{Float64, Oceananigans.Grids.FullyConnected, Flat, Flat} on Distributed{CPU} with 3×0×0 halo
│    ├── FullyConnected x  [0.2, 0.4) regularly spaced with Δx=0.1
│    ├── Flat y
└    └── Flat z

Note that the x-limits for the local grids explicitly include the rightmost boundary only for rank 4 (square bracket instead of round bracket).

@avik-pal

This comment was marked as off-topic.

@avik-pal
Copy link
Collaborator Author

closing in favor of #825. That is a more general and less intrusive solution

@avik-pal avik-pal closed this Feb 28, 2025
@giordano giordano deleted the ap/padded_sharding branch March 16, 2025 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sharding with arbitrary non-divisible dimensions

5 participants