Run icon4py on gpu #579

OngChia · 2024-10-30T14:59:14Z

Enable GPU run for icon4py.
Changes:

Remove backend and xp imports from common.setting.py. The backend is instead added into one of the click options. xp is decided explicitly by calling a global function that returns np if cpu backends are used or cp if gpu backends are used in icon4py_configuration.py.
Computation in vertical.py is changed to numpy and a backend argument is added in get_vct_a_and_vct_b to return the gt4py vct_a and vct_b fields with correct allocator.
A new field_operator _interpolate_to_half_levels_wp is used in compute_virtual_potential_temperatures_and_pressure_gradient.py to avoid CUDA illegal memory access for an unknown reason.
add backend when as_field is called to generate gt4py fields from serialized data. However, xp and backend are still read from common.setting.py, it is still not possible to run the driver with cpu backend if you set the environment variable ICON4PY_BACKEND to GPU. This remains to be done in a separate PR.

…into run_icon4py_on_gpu

OngChia · 2024-11-08T12:34:01Z

cscs-ci run default

OngChia · 2024-11-08T14:09:22Z

cscs-ci run default

OngChia · 2024-11-08T14:09:32Z

launch jenkins spack

OngChia · 2024-11-08T14:58:29Z

cscs-ci run default

OngChia · 2024-11-08T14:58:36Z

launch jenkins spack

nfarabullini

LGTM! I left a few comments but I'm not sure if they are needed

nfarabullini · 2024-11-11T09:02:40Z

model/driver/src/icon4py/model/driver/test_cases/gauss3d.py

-    pressure_numpy = xp.zeros((num_cells, num_levels), dtype=float)
-    theta_v_numpy = xp.zeros((num_cells, num_levels), dtype=float)
-    eta_v_numpy = xp.zeros((num_cells, num_levels), dtype=float)
+    w_ndarray = xp.zeros((num_cells, num_levels + 1), dtype=float)


Suggested change

w_ndarray = xp.zeros((num_cells, num_levels + 1), dtype=float)

w_ndarray = np.zeros((num_cells, num_levels + 1), dtype=float)

?

Thanks for the comments.
I declare the xp via xp = driver_config.host_or_device_array(backend) and try to scrap the dependency on xp in settings.py. This is something I am not sure whether it is a good idea to do that or not. This function is also used when running the icon4py_driver. Perhaps I should also have a pytest fixture xp that switches between cp and np depending whether the backend is on CPU or GPU.

nfarabullini · 2024-11-11T09:54:58Z

model/driver/src/icon4py/model/driver/test_cases/utils.py

-):
+    backend: gt4py_backend.Backend,
+) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
+    xp = driver_config.host_or_device_array(backend)


Suggested change

xp = driver_config.host_or_device_array(backend)

xp = driver_config.host_or_device_array(gt4py_backend)

maybe? not sure

I rename the backend module imported from gt4py.next as gt4py_backend to differentiate it from the input argument backend. So, backend should be correct.

nfarabullini · 2024-11-11T09:55:13Z

model/driver/src/icon4py/model/driver/test_cases/utils.py

@@ -130,6 +134,8 @@ def zonalwind_2_normalwind_numpy(
    """
    # TODO (Chia Rui) this function needs a test

+    xp = driver_config.host_or_device_array(backend)


same here as above

halungge

Mostly, review the imports in vertical.py and I suggest to do what we do now in host_or_device_array in icon4py.common instead of icon4py.driver

model/atmosphere/dycore/src/icon4py/model/atmosphere/dycore/interpolate_to_half_levels_wp.py

halungge · 2024-11-12T07:48:40Z

...n4py/model/atmosphere/dycore/compute_virtual_potential_temperatures_and_pressure_gradient.py

@@ -37,7 +40,7 @@ def _compute_virtual_potential_temperatures_and_pressure_gradient(
    wgtfac_c_wp, ddqz_z_half_wp = astype((wgtfac_c, ddqz_z_half), wpfloat)

    z_theta_v_pr_ic_vp = _interpolate_to_half_levels_vp(wgtfac_c=wgtfac_c, interpolant=z_rth_pr_2)
-    theta_v_ic_wp = wgtfac_c_wp * theta_v + (wpfloat("1.0") - wgtfac_c_wp) * theta_v(Koff[-1])
+    theta_v_ic_wp = _interpolate_to_half_levels_wp(wgtfac_c=wgtfac_c_wp, interpolant=theta_v)


what again was the problem with the inline version? (I have nothing against the stencil, I think it is even better in terms of readability, but I am still wondering why it changes anything.)

I always faced cuda illegal memory access error with the inline version when running with resolutions greater than R2B6. I still do not understand why. I am not sure whether it is due to some deep reasons or bugs else where. At least so far for R2B7, everything is okay and results are verified with the new functional call. (As I also indeed like the functional call for interpolation) I tend to keep it this way.

model/common/src/icon4py/model/common/grid/vertical.py

halungge · 2024-11-12T10:08:56Z

model/common/src/icon4py/model/common/test_utils/serialbox_utils.py

@@ -453,8 +454,8 @@ def construct_icon_grid(self, on_gpu: bool) -> icon.IconGrid:
        )
        c2e2c = self.c2e2c()
        e2c2e = self.e2c2e()
-        c2e2c0 = xp.column_stack(((range(c2e2c.shape[0])), c2e2c))
-        e2c2e0 = xp.column_stack(((range(e2c2e.shape[0])), e2c2e))
+        c2e2c0 = xp.column_stack((xp.asarray(range(c2e2c.shape[0])), c2e2c))


you have this on_gpu flag as an argument in this function. I don't actually now whether it is always set correctly and we might trade it for a backen argument as well, but you could do the conditional import of the array_ns based on that.

I have decided to use numpy for reading these connectivities from serialized data and used array_ns and on_gpu to decide between cp or np in _get_offset_provider of BaseGrid class. It is enough to make the driver run on gpu or cpu depending on the backend. I also changed icon_grid in datatest_fixtures.py to assign on_gpu correctly according to the backend set in the pytest option.
I think all those remaining fields can also be read in as numpy arrays in serialbox_utils.py. I am not sure about the strategy, so I did not change the remaining part in serialbox_utils.py. Should I change xp to np and add backend to IconSavepoint class instance variables.

halungge · 2024-11-12T10:29:20Z

model/common/src/icon4py/model/common/utils/gt4py_field_allocation.py

@@ -25,7 +25,7 @@ def allocate_zero_field(
    if is_halfdim:
        assert len(shapex) == 2
        shapex = (shapex[0], shapex[1] + 1)
-    return gtx.as_field(dims, xp.zeros(shapex, dtype=dtype), allocator=backend)
+    return gtx.as_field(dims, np.zeros(shapex, dtype=dtype), allocator=backend)


here you could directly use the gt4py function:

Suggested change

return gtx.as_field(dims, np.zeros(shapex, dtype=dtype), allocator=backend)

def allocate_zero_field(

*dims: gtx.Dimension,

grid,

is_halfdim=False,

dtype=ta.wpfloat,

backend: Optional[backend.Backend] = None,

):

def size(dim:gtx.Dimension, is_half_dim: bool) -> int:

if dim == dims.KDim and is_half_dim:

return grid.size[dim] + 1

else:

return grid.size[dim]

dimensions = {d: range(size(d, is_halfdim)) for d in dims }

return gtx.zeros(dimensions, dtype=dtype, allocator=backend)

Super thanks!
I did not know that I could properly create a new dimension with a different size like that until now...

halungge · 2024-11-12T10:31:18Z

model/driver/src/icon4py/model/driver/icon4py_configuration.py

+    DriverBackends.GTFN_GPU_CACHED: run_gtfn_gpu_cached,
+}
+
+


Something like that I meant above. But I think it is better to have it in common and then use the properties of the gtx.Backends DeviceType to determine which import to use (see above), because then we can reuse it in all model packages.

Thanks. I have removed this function and used the one as suggested in your comment above.

model/driver/src/icon4py/model/driver/icon4py_driver.py

halungge · 2024-11-12T10:50:18Z

model/atmosphere/diffusion/src/icon4py/model/atmosphere/diffusion/diffusion.py

-        log.debug(
-            "running stencils 11 12 (calculate_enhanced_diffusion_coefficients_for_grid_point_cold_pools): end"
-        )
+        if self.config.apply_to_temperature:


So that stencil should be inside the if and it was not before? We never catch these kind of bugs because we allways run the same configuration. So we only ever test on branch of ifs... :-( From this point of view it is very bad that the configuration of EXCLAIM.APE is similar to MCH_CH_R04B09_DSL. We had a discussion on this yesterday. We really should come up with a list of configurations that we want to support and then test all of them and delete what we don't want and test. @anuragdipankar @lxavier @muellch

Yes, those stencils are inside the if statement. The temperature diffusion should be turned off in the standard Jablonowski Williamson test.

OngChia · 2024-11-17T19:26:45Z

cscs-ci run default

OngChia · 2024-11-17T19:26:51Z

launch jenkins spack

OngChia · 2024-11-18T08:22:18Z

cscs-ci run default

OngChia · 2024-11-18T08:22:24Z

launch jenkins spack

github-actions · 2024-11-20T10:55:51Z

Mandatory Tests

Please make sure you run these tests via comment before you merge!

cscs-ci run default
launch jenkins spack

Optional Tests

To run benchmarks you can use:

cscs-ci run benchmark

To run tests and benchmarks with the DaCe backend you can use:

cscs-ci run dace

In case your change might affect downstream icon-exclaim, please consider running

launch jenkins icon

For more detailed information please look at CI in the EXCLAIM universe.

OngChia added 8 commits October 30, 2024 13:52

add backend to as_field in driver initialization and serialbox_utils.py

9f8e582

merge main

e16ed4f

diffusion apply_to_temperature fix

9c01cce

debug

f55530f

Merge branch 'run_icon4py_on_gpu' of https://github.com/C2SM/icon4py …

5af4186

…into run_icon4py_on_gpu

Merge branch 'main' into run_icon4py_on_gpu

0b71083

add backend in driver functions

fdca0f9

remove redundant function

aeeac65

OngChia added 3 commits November 8, 2024 14:17

add back vct array in VerticalGrid

27b5062

fix backend in gauss test

77543da

Merge branch 'main' into run_icon4py_on_gpu

6104159

add backend to get_vt_a_and_vct_b in the test

ef9a00d

OngChia requested review from nfarabullini, halungge and samkellerhals November 8, 2024 14:58

nfarabullini approved these changes Nov 11, 2024

View reviewed changes

Merge branch 'main' into run_icon4py_on_gpu

a415e4f

halungge requested changes Nov 12, 2024

View reviewed changes

OngChia added 5 commits November 15, 2024 12:37

add docstring, xp function in common, fix changes in vertical.py

2d8328e

Merge branch 'main' into run_icon4py_on_gpu

ca1fcb0

add xp allocation in get_offset_provider

32c1cf4

add xp allocation in get_offset_provider

c49cabc

Merge branch 'main' into run_icon4py_on_gpu

045f820

OngChia requested review from halungge and nfarabullini November 17, 2024 19:19

remove xp.ndarray in driver test_utils

9a21467

OngChia added 2 commits November 20, 2024 11:54

merge main and correctly set on_gpu in pytest

d98b497

Merge branch 'main' into run_icon4py_on_gpu

7cffcfc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run icon4py on gpu #579

Run icon4py on gpu #579

OngChia commented Oct 30, 2024 •

edited

Loading

OngChia commented Nov 8, 2024

OngChia commented Nov 8, 2024

OngChia commented Nov 8, 2024

OngChia commented Nov 8, 2024

OngChia commented Nov 8, 2024

nfarabullini left a comment

nfarabullini Nov 11, 2024

OngChia Nov 11, 2024 •

edited

Loading

nfarabullini Nov 11, 2024

OngChia Nov 18, 2024

nfarabullini Nov 11, 2024

halungge left a comment

halungge Nov 12, 2024

OngChia Nov 12, 2024

halungge Nov 12, 2024

OngChia Nov 17, 2024 •

edited

Loading

halungge Nov 12, 2024

OngChia Nov 15, 2024 •

edited

Loading

halungge Nov 12, 2024

OngChia Nov 15, 2024

halungge Nov 12, 2024

OngChia Nov 15, 2024

OngChia commented Nov 17, 2024

OngChia commented Nov 17, 2024

OngChia commented Nov 18, 2024

OngChia commented Nov 18, 2024

github-actions bot commented Nov 20, 2024

	w_ndarray = xp.zeros((num_cells, num_levels + 1), dtype=float)
	w_ndarray = np.zeros((num_cells, num_levels + 1), dtype=float)

	xp = driver_config.host_or_device_array(backend)
	xp = driver_config.host_or_device_array(gt4py_backend)

-    return gtx.as_field(dims, np.zeros(shapex, dtype=dtype), allocator=backend)
+def allocate_zero_field(
+    *dims: gtx.Dimension,
+    grid,
+    is_halfdim=False,
+    dtype=ta.wpfloat,
+    backend: Optional[backend.Backend] = None,
+):
+    def size(dim:gtx.Dimension, is_half_dim: bool) -> int:
+         if dim == dims.KDim and is_half_dim:
+            return grid.size[dim] + 1
+         else:
+            return  grid.size[dim]
+    dimensions = {d: range(size(d, is_halfdim)) for d in dims }
+    return gtx.zeros(dimensions, dtype=dtype, allocator=backend)

Run icon4py on gpu #579

Are you sure you want to change the base?

Run icon4py on gpu #579

Conversation

OngChia commented Oct 30, 2024 • edited Loading

OngChia commented Nov 8, 2024

OngChia commented Nov 8, 2024

OngChia commented Nov 8, 2024

OngChia commented Nov 8, 2024

OngChia commented Nov 8, 2024

nfarabullini left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OngChia Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

halungge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OngChia Nov 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OngChia Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OngChia commented Nov 17, 2024

OngChia commented Nov 17, 2024

OngChia commented Nov 18, 2024

OngChia commented Nov 18, 2024

github-actions bot commented Nov 20, 2024

OngChia commented Oct 30, 2024 •

edited

Loading

OngChia Nov 11, 2024 •

edited

Loading

OngChia Nov 17, 2024 •

edited

Loading

OngChia Nov 15, 2024 •

edited

Loading