Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gpu datatest #553

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open

Gpu datatest #553

wants to merge 23 commits into from

Conversation

OngChia
Copy link
Contributor

@OngChia OngChia commented Sep 19, 2024

Enable single-node GPU run for the Jablonowski-WIlliamson test on R2B4 grid and all datatests for the Timeloop.

  1. Change np to xp in serialbox_utils.py.
  2. Change asnumpy() to ndarray in test_timeloop.py and jablownski_williamson.py and driver/utils.py
  3. Split predictor_stencils_7_8_9 to two stencils to avoid illegal memory access when running on gpu.
  4. Add a new field exner_dyn_incr_lastsubstep in class DiagnosticStateNonHydro as the output argument for the stencil update_dynamical_exner_time_increment because this stencil originally updates the input argument exner_dyn_incr.

@OngChia
Copy link
Contributor Author

OngChia commented Sep 19, 2024

cscs-ci run default

@OngChia
Copy link
Contributor Author

OngChia commented Sep 19, 2024

cscs-ci run default

Copy link
Contributor

@halungge halungge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some renaming, for the numpy.ma parts you might pick changes from this PR

@@ -33,6 +33,7 @@
log = logging.getLogger(__name__)


# TODO (Chia Rui): Convert all numpy computations to cupy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't you done this now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Thanks. I forgot to delete it.

@@ -49,6 +49,38 @@ def _compute_virtual_potential_temperatures_and_pressure_gradient(
return z_theta_v_pr_ic_vp, theta_v_ic_wp, astype(z_th_ddz_exner_c_wp, vpfloat)


@field_operator
def _compute_only_virtual_potential_temperatures(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _compute_only_virtual_potential_temperatures(
def _compute_virtual_potential_temperatures(

the only is kind of implicit...



@field_operator
def _compute_only_pressure_gradient(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _compute_only_pressure_gradient(
def _compute_pressure_gradient(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to remove the duplication I would have the field_operator call the two new ones, like that

@field_operator
def _compute_virtual_potential_temperatures_and_pressure_gradient(...):
   z_theta_v_pr_ic_vp, theta_v_ic_wp = _compute_only_virtual_potential_temperatures()
   z_th_ddz_exner_c_wp = _compute_only_pressure_gradient(...)
  return z_theta_v_pr_ic_vp, theta_v_ic_wp, astype(z_th_ddz_exner_wp, vpfloat)

This will also not harm the blueline as it is purely internal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. Good suggestion. Thanks!



@gtx.program(grid_type=gtx.GridType.UNSTRUCTURED, backend=backend)
def predictor_stencils_7_8_9_secondstep(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could even call this one compute_pressure_gradient

@@ -69,6 +70,7 @@ class EntryType(IntEnum):
@builder.builder
def with_dimension(self, dim: Dimension, global_index: np.ndarray, owner_mask: np.ndarray):
masked_global_index = ma.array(global_index, mask=owner_mask)
masked_global_index = xp.asarray(masked_global_index)
self._global_index[dim] = masked_global_index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that won't work. I don't now what happens with the mask if you do that and we need it that mask later. But indeed cupy does not implement the numpy's masked_array interface. We would have to do a simple hand made implementation which should be enough for what we need. I'll open a fix for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Thanks.

@OngChia
Copy link
Contributor Author

OngChia commented Sep 19, 2024

cscs-ci run default

@OngChia
Copy link
Contributor Author

OngChia commented Sep 24, 2024

cscs-ci run default

@OngChia
Copy link
Contributor Author

OngChia commented Sep 24, 2024

launch jenkins spack

@OngChia
Copy link
Contributor Author

OngChia commented Sep 24, 2024

cscs-ci run default

@OngChia
Copy link
Contributor Author

OngChia commented Sep 24, 2024

launch jenkins spack

@OngChia
Copy link
Contributor Author

OngChia commented Sep 24, 2024

cscs-ci run default

@OngChia
Copy link
Contributor Author

OngChia commented Sep 24, 2024

launch jenkins spack

OngChia and others added 4 commits September 25, 2024 09:17
* implement a simple poor mans masked array as the numpy.ma interface is not implementd in cupy.

* fix imports: use xp
fix typo in dunder function

* fix imports

* fix missing device copy in serialbox_utils.py
* keep start and end indices on host,
remove some functions

* fix typo
@OngChia
Copy link
Contributor Author

OngChia commented Sep 25, 2024

cscs-ci run default

@OngChia
Copy link
Contributor Author

OngChia commented Sep 25, 2024

launch jenkins spack

Copy link

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default
  • launch jenkins spack

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

In case your change might affect downstream icon-exclaim, please consider running

  • launch jenkins icon

For more detailed information please look at CI in the EXCLAIM universe.

Comment on lines +1647 to +1659
log.info(
f" MAXPRE VN: {prognostic_state[nnew].vn.ndarray.max():.15e} , MAXPRE W: {prognostic_state[nnew].w.ndarray.max():.15e}"
)
log.info(
f" MAXPRE RHO: {prognostic_state[nnew].rho.ndarray.max():.15e} , MAXPRE THETA_V: {prognostic_state[nnew].theta_v.ndarray.max():.15e}"
)
log.info(
f" AVEPRE VN: {prognostic_state[nnew].vn.ndarray.mean(axis=(0,1)):.15e} , AVEPRE W: {prognostic_state[nnew].w.ndarray.mean(axis=(0,1)):.15e}"
)
log.info(
f" AVEPRE RHO: {prognostic_state[nnew].rho.ndarray.mean(axis=(0,1)):.15e} , AVEPRE THETA_V: {prognostic_state[nnew].theta_v.ndarray.mean(axis=(0,1)):.15e}"
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are reductions and hurt performance even if logging level is set to higher than info. If you want to keep them, see
https://stackoverflow.com/questions/58592292/does-python-logging-incur-a-performance-hit-if-you-log-below-the-set-level

if logger.isEnabledFor(logging.DEBUG):
    logger.debug('Message with %s, %s', expensive_func1(),
                                        expensive_func2())

But then I would still demote them to DEBUG (not INFO).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants