-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run icon4py on gpu #579
base: main
Are you sure you want to change the base?
Run icon4py on gpu #579
Conversation
…into run_icon4py_on_gpu
cscs-ci run default |
cscs-ci run default |
launch jenkins spack |
cscs-ci run default |
launch jenkins spack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I left a few comments but I'm not sure if they are needed
pressure_numpy = xp.zeros((num_cells, num_levels), dtype=float) | ||
theta_v_numpy = xp.zeros((num_cells, num_levels), dtype=float) | ||
eta_v_numpy = xp.zeros((num_cells, num_levels), dtype=float) | ||
w_ndarray = xp.zeros((num_cells, num_levels + 1), dtype=float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w_ndarray = xp.zeros((num_cells, num_levels + 1), dtype=float) | |
w_ndarray = np.zeros((num_cells, num_levels + 1), dtype=float) |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments.
I declare the xp
via xp = driver_config.host_or_device_array(backend)
and try to scrap the dependency on xp
in settings.py
. This is something I am not sure whether it is a good idea to do that or not. This function is also used when running the icon4py_driver. Perhaps I should also have a pytest fixture xp that switches between cp and np depending whether the backend is on CPU or GPU.
): | ||
backend: gt4py_backend.Backend, | ||
) -> tuple[np.ndarray, np.ndarray, np.ndarray]: | ||
xp = driver_config.host_or_device_array(backend) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xp = driver_config.host_or_device_array(backend) | |
xp = driver_config.host_or_device_array(gt4py_backend) |
maybe? not sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rename the backend module imported from gt4py.next as gt4py_backend to differentiate it from the input argument backend
. So, backend
should be correct.
@@ -130,6 +134,8 @@ def zonalwind_2_normalwind_numpy( | |||
""" | |||
# TODO (Chia Rui) this function needs a test | |||
|
|||
xp = driver_config.host_or_device_array(backend) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly, review the imports in vertical.py
and I suggest to do what we do now in host_or_device_array
in icon4py.common
instead of icon4py.driver
model/atmosphere/dycore/src/icon4py/model/atmosphere/dycore/interpolate_to_half_levels_wp.py
Outdated
Show resolved
Hide resolved
@@ -37,7 +40,7 @@ def _compute_virtual_potential_temperatures_and_pressure_gradient( | |||
wgtfac_c_wp, ddqz_z_half_wp = astype((wgtfac_c, ddqz_z_half), wpfloat) | |||
|
|||
z_theta_v_pr_ic_vp = _interpolate_to_half_levels_vp(wgtfac_c=wgtfac_c, interpolant=z_rth_pr_2) | |||
theta_v_ic_wp = wgtfac_c_wp * theta_v + (wpfloat("1.0") - wgtfac_c_wp) * theta_v(Koff[-1]) | |||
theta_v_ic_wp = _interpolate_to_half_levels_wp(wgtfac_c=wgtfac_c_wp, interpolant=theta_v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what again was the problem with the inline version? (I have nothing against the stencil, I think it is even better in terms of readability, but I am still wondering why it changes anything.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always faced cuda illegal memory access error with the inline version when running with resolutions greater than R2B6. I still do not understand why. I am not sure whether it is due to some deep reasons or bugs else where. At least so far for R2B7, everything is okay and results are verified with the new functional call. (As I also indeed like the functional call for interpolation) I tend to keep it this way.
@@ -453,8 +454,8 @@ def construct_icon_grid(self, on_gpu: bool) -> icon.IconGrid: | |||
) | |||
c2e2c = self.c2e2c() | |||
e2c2e = self.e2c2e() | |||
c2e2c0 = xp.column_stack(((range(c2e2c.shape[0])), c2e2c)) | |||
e2c2e0 = xp.column_stack(((range(e2c2e.shape[0])), e2c2e)) | |||
c2e2c0 = xp.column_stack((xp.asarray(range(c2e2c.shape[0])), c2e2c)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have this on_gpu
flag as an argument in this function. I don't actually now whether it is always set correctly and we might trade it for a backen argument as well, but you could do the conditional import of the array_ns
based on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have decided to use numpy for reading these connectivities from serialized data and used array_ns
and on_gpu
to decide between cp
or np
in _get_offset_provider
of BaseGrid
class. It is enough to make the driver run on gpu or cpu depending on the backend. I also changed icon_grid
in datatest_fixtures.py
to assign on_gpu
correctly according to the backend set in the pytest option.
I think all those remaining fields can also be read in as numpy arrays in serialbox_utils.py
. I am not sure about the strategy, so I did not change the remaining part in serialbox_utils.py
. Should I change xp to np and add backend
to IconSavepoint
class instance variables.
@@ -25,7 +25,7 @@ def allocate_zero_field( | |||
if is_halfdim: | |||
assert len(shapex) == 2 | |||
shapex = (shapex[0], shapex[1] + 1) | |||
return gtx.as_field(dims, xp.zeros(shapex, dtype=dtype), allocator=backend) | |||
return gtx.as_field(dims, np.zeros(shapex, dtype=dtype), allocator=backend) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here you could directly use the gt4py
function:
return gtx.as_field(dims, np.zeros(shapex, dtype=dtype), allocator=backend) | |
def allocate_zero_field( | |
*dims: gtx.Dimension, | |
grid, | |
is_halfdim=False, | |
dtype=ta.wpfloat, | |
backend: Optional[backend.Backend] = None, | |
): | |
def size(dim:gtx.Dimension, is_half_dim: bool) -> int: | |
if dim == dims.KDim and is_half_dim: | |
return grid.size[dim] + 1 | |
else: | |
return grid.size[dim] | |
dimensions = {d: range(size(d, is_halfdim)) for d in dims } | |
return gtx.zeros(dimensions, dtype=dtype, allocator=backend) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super thanks!
I did not know that I could properly create a new dimension with a different size like that until now...
DriverBackends.GTFN_GPU_CACHED: run_gtfn_gpu_cached, | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like that I meant above. But I think it is better to have it in common
and then use the properties of the gtx.Backend
s DeviceType
to determine which import to use (see above), because then we can reuse it in all model packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I have removed this function and used the one as suggested in your comment above.
log.debug( | ||
"running stencils 11 12 (calculate_enhanced_diffusion_coefficients_for_grid_point_cold_pools): end" | ||
) | ||
if self.config.apply_to_temperature: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that stencil should be inside the if
and it was not before? We never catch these kind of bugs because we allways run the same configuration. So we only ever test on branch of ifs... :-( From this point of view it is very bad that the configuration of EXCLAIM.APE is similar to MCH_CH_R04B09_DSL. We had a discussion on this yesterday. We really should come up with a list of configurations that we want to support and then test all of them and delete what we don't want and test. @anuragdipankar @lxavier @muellch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, those stencils are inside the if
statement. The temperature diffusion should be turned off in the standard Jablonowski Williamson test.
cscs-ci run default |
launch jenkins spack |
cscs-ci run default |
launch jenkins spack |
Mandatory Tests Please make sure you run these tests via comment before you merge!
Optional Tests To run benchmarks you can use:
To run tests and benchmarks with the DaCe backend you can use:
In case your change might affect downstream icon-exclaim, please consider running
For more detailed information please look at CI in the EXCLAIM universe. |
Enable GPU run for icon4py.
Changes:
xp
imports fromcommon.setting.py
. The backend is instead added into one of the click options.xp
is decided explicitly by calling a global function that returnsnp
if cpu backends are used orcp
if gpu backends are used inicon4py_configuration.py
.get_vct_a_and_vct_b
to return the gt4pyvct_a
andvct_b
fields with correct allocator._interpolate_to_half_levels_wp
is used incompute_virtual_potential_temperatures_and_pressure_gradient.py
to avoid CUDA illegal memory access for an unknown reason.common.setting.py
, it is still not possible to run the driver with cpu backend if you set the environment variable ICON4PY_BACKEND to GPU. This remains to be done in a separate PR.