Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run icon4py on gpu #579

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Run icon4py on gpu #579

wants to merge 21 commits into from

Conversation

OngChia
Copy link
Contributor

@OngChia OngChia commented Oct 30, 2024

Enable GPU run for icon4py.
Changes:

  1. Remove backend and xp imports from common.setting.py. The backend is instead added into one of the click options. xp is decided explicitly by calling a global function that returns np if cpu backends are used or cp if gpu backends are used in icon4py_configuration.py.
  2. Computation in vertical.py is changed to numpy and a backend argument is added in get_vct_a_and_vct_b to return the gt4py vct_a and vct_b fields with correct allocator.
  3. A new field_operator _interpolate_to_half_levels_wp is used in compute_virtual_potential_temperatures_and_pressure_gradient.py to avoid CUDA illegal memory access for an unknown reason.
  4. add backend when as_field is called to generate gt4py fields from serialized data. However, xp and backend are still read from common.setting.py, it is still not possible to run the driver with cpu backend if you set the environment variable ICON4PY_BACKEND to GPU. This remains to be done in a separate PR.

@OngChia
Copy link
Contributor Author

OngChia commented Nov 8, 2024

cscs-ci run default

@OngChia
Copy link
Contributor Author

OngChia commented Nov 8, 2024

cscs-ci run default

@OngChia
Copy link
Contributor Author

OngChia commented Nov 8, 2024

launch jenkins spack

@OngChia
Copy link
Contributor Author

OngChia commented Nov 8, 2024

cscs-ci run default

@OngChia
Copy link
Contributor Author

OngChia commented Nov 8, 2024

launch jenkins spack

Copy link
Contributor

@nfarabullini nfarabullini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I left a few comments but I'm not sure if they are needed

pressure_numpy = xp.zeros((num_cells, num_levels), dtype=float)
theta_v_numpy = xp.zeros((num_cells, num_levels), dtype=float)
eta_v_numpy = xp.zeros((num_cells, num_levels), dtype=float)
w_ndarray = xp.zeros((num_cells, num_levels + 1), dtype=float)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
w_ndarray = xp.zeros((num_cells, num_levels + 1), dtype=float)
w_ndarray = np.zeros((num_cells, num_levels + 1), dtype=float)

?

Copy link
Contributor Author

@OngChia OngChia Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments.
I declare the xp via xp = driver_config.host_or_device_array(backend) and try to scrap the dependency on xp in settings.py. This is something I am not sure whether it is a good idea to do that or not. This function is also used when running the icon4py_driver. Perhaps I should also have a pytest fixture xp that switches between cp and np depending whether the backend is on CPU or GPU.

):
backend: gt4py_backend.Backend,
) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
xp = driver_config.host_or_device_array(backend)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
xp = driver_config.host_or_device_array(backend)
xp = driver_config.host_or_device_array(gt4py_backend)

maybe? not sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rename the backend module imported from gt4py.next as gt4py_backend to differentiate it from the input argument backend. So, backend should be correct.

@@ -130,6 +134,8 @@ def zonalwind_2_normalwind_numpy(
"""
# TODO (Chia Rui) this function needs a test

xp = driver_config.host_or_device_array(backend)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here as above

Copy link
Contributor

@halungge halungge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly, review the imports in vertical.py and I suggest to do what we do now in host_or_device_array in icon4py.common instead of icon4py.driver

@@ -37,7 +40,7 @@ def _compute_virtual_potential_temperatures_and_pressure_gradient(
wgtfac_c_wp, ddqz_z_half_wp = astype((wgtfac_c, ddqz_z_half), wpfloat)

z_theta_v_pr_ic_vp = _interpolate_to_half_levels_vp(wgtfac_c=wgtfac_c, interpolant=z_rth_pr_2)
theta_v_ic_wp = wgtfac_c_wp * theta_v + (wpfloat("1.0") - wgtfac_c_wp) * theta_v(Koff[-1])
theta_v_ic_wp = _interpolate_to_half_levels_wp(wgtfac_c=wgtfac_c_wp, interpolant=theta_v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what again was the problem with the inline version? (I have nothing against the stencil, I think it is even better in terms of readability, but I am still wondering why it changes anything.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always faced cuda illegal memory access error with the inline version when running with resolutions greater than R2B6. I still do not understand why. I am not sure whether it is due to some deep reasons or bugs else where. At least so far for R2B7, everything is okay and results are verified with the new functional call. (As I also indeed like the functional call for interpolation) I tend to keep it this way.

model/common/src/icon4py/model/common/grid/vertical.py Outdated Show resolved Hide resolved
model/common/src/icon4py/model/common/grid/vertical.py Outdated Show resolved Hide resolved
model/common/src/icon4py/model/common/grid/vertical.py Outdated Show resolved Hide resolved
@@ -453,8 +454,8 @@ def construct_icon_grid(self, on_gpu: bool) -> icon.IconGrid:
)
c2e2c = self.c2e2c()
e2c2e = self.e2c2e()
c2e2c0 = xp.column_stack(((range(c2e2c.shape[0])), c2e2c))
e2c2e0 = xp.column_stack(((range(e2c2e.shape[0])), e2c2e))
c2e2c0 = xp.column_stack((xp.asarray(range(c2e2c.shape[0])), c2e2c))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have this on_gpu flag as an argument in this function. I don't actually now whether it is always set correctly and we might trade it for a backen argument as well, but you could do the conditional import of the array_ns based on that.

Copy link
Contributor Author

@OngChia OngChia Nov 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have decided to use numpy for reading these connectivities from serialized data and used array_ns and on_gpu to decide between cp or np in _get_offset_provider of BaseGrid class. It is enough to make the driver run on gpu or cpu depending on the backend. I also changed icon_grid in datatest_fixtures.py to assign on_gpu correctly according to the backend set in the pytest option.
I think all those remaining fields can also be read in as numpy arrays in serialbox_utils.py. I am not sure about the strategy, so I did not change the remaining part in serialbox_utils.py. Should I change xp to np and add backend to IconSavepoint class instance variables.

@@ -25,7 +25,7 @@ def allocate_zero_field(
if is_halfdim:
assert len(shapex) == 2
shapex = (shapex[0], shapex[1] + 1)
return gtx.as_field(dims, xp.zeros(shapex, dtype=dtype), allocator=backend)
return gtx.as_field(dims, np.zeros(shapex, dtype=dtype), allocator=backend)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you could directly use the gt4py function:

Suggested change
return gtx.as_field(dims, np.zeros(shapex, dtype=dtype), allocator=backend)
def allocate_zero_field(
*dims: gtx.Dimension,
grid,
is_halfdim=False,
dtype=ta.wpfloat,
backend: Optional[backend.Backend] = None,
):
def size(dim:gtx.Dimension, is_half_dim: bool) -> int:
if dim == dims.KDim and is_half_dim:
return grid.size[dim] + 1
else:
return grid.size[dim]
dimensions = {d: range(size(d, is_halfdim)) for d in dims }
return gtx.zeros(dimensions, dtype=dtype, allocator=backend)

Copy link
Contributor Author

@OngChia OngChia Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super thanks!
I did not know that I could properly create a new dimension with a different size like that until now...

DriverBackends.GTFN_GPU_CACHED: run_gtfn_gpu_cached,
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like that I meant above. But I think it is better to have it in common and then use the properties of the gtx.Backends DeviceType to determine which import to use (see above), because then we can reuse it in all model packages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have removed this function and used the one as suggested in your comment above.

log.debug(
"running stencils 11 12 (calculate_enhanced_diffusion_coefficients_for_grid_point_cold_pools): end"
)
if self.config.apply_to_temperature:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that stencil should be inside the if and it was not before? We never catch these kind of bugs because we allways run the same configuration. So we only ever test on branch of ifs... :-( From this point of view it is very bad that the configuration of EXCLAIM.APE is similar to MCH_CH_R04B09_DSL. We had a discussion on this yesterday. We really should come up with a list of configurations that we want to support and then test all of them and delete what we don't want and test. @anuragdipankar @lxavier @muellch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those stencils are inside the if statement. The temperature diffusion should be turned off in the standard Jablonowski Williamson test.

@OngChia
Copy link
Contributor Author

OngChia commented Nov 17, 2024

cscs-ci run default

@OngChia
Copy link
Contributor Author

OngChia commented Nov 17, 2024

launch jenkins spack

@OngChia
Copy link
Contributor Author

OngChia commented Nov 18, 2024

cscs-ci run default

@OngChia
Copy link
Contributor Author

OngChia commented Nov 18, 2024

launch jenkins spack

Copy link

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default
  • launch jenkins spack

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

In case your change might affect downstream icon-exclaim, please consider running

  • launch jenkins icon

For more detailed information please look at CI in the EXCLAIM universe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants