Skip to content

Conversation

@SeanBryan51
Copy link
Collaborator

@SeanBryan51 SeanBryan51 commented Mar 13, 2025

Currently running serial and MPI runs for the gswp3 configuration (see MPI and serial configurations) with CASA-CNP enabled1 shows bitwise differences between serial and MPI runs in the CASA restart and CASA NetCDF output file (all other outputs, e.g. standard CABLE outputs and restarts, are bitwise identical between serial and MPI). This change fixes a few bugs in the MPI master driver and the CASA-CNP code so that we have bitwise reproducibility between serial and MPI in the CASA output and restart files for this configuration.

Type of change

Please delete options that are not relevant.

  • Bug fix

Checklist

  • I have checked my code/text and corrected any misspellings

Testing

  • Are the changes bitwise-compatible with the main branch? If working on an optional feature, are the results bitwise-compatible when this feature is off? If yes, copy benchcab output showing successful completion of the bitwise compatibility tests or equivalent results below this line.
2025-03-13 17:24:21,872 - INFO - benchcab.benchcab.py:380 - Running comparison tasks...
2025-03-13 17:24:21,917 - INFO - benchcab.benchcab.py:381 - tasks: 168 (models: 2, sites: 42, science configurations: 4)
2025-03-13 17:27:12,385 - INFO - benchcab.benchcab.py:391 - 0 failed, 168 passed

Please add a reviewer when ready for review.


📚 Documentation preview 📚: https://cable--567.org.readthedocs.build/en/567/

Footnotes

  1. Note: CASA-CNP was enabled without a CASA restart file (i.e. cable_user%CASA_fromZero = .TRUE.).

@SeanBryan51 SeanBryan51 force-pushed the fix-serial-mpi-non-reproducibility-for-casa-cnp branch 2 times, most recently from a0c4c5d to 877e0d9 Compare March 13, 2025 05:16
@SeanBryan51 SeanBryan51 changed the title Fix serial-MPI non-reproducibility for CASA-CNP Fix serial-MPI non-reproducibility for gswp3 CASA-CNP configuration Mar 13, 2025
@SeanBryan51 SeanBryan51 marked this pull request as ready for review March 14, 2025 02:57
@SeanBryan51 SeanBryan51 force-pushed the fix-serial-mpi-non-reproducibility-for-casa-cnp branch from 877e0d9 to cf11792 Compare March 14, 2025 03:07
This is done to avoid exceptions due to uninitialised memory access for
debug builds.
Currently the phen type is not initialised properly in the MPI
implementation and results in uninitialised values being written to the
restart file. This change initialises the phen type on allocation so
it is initialised for both serial and MPI applications.
Some CASA variables required in the output file are not being
communicated back to the master process from the workers. This change
communicates the required variables from worker to master and is
required to restore bitwise reproducibility in the CASA netcdf output
file between serial and MPI runs.
Currently the MPI implementation does not output time-averaged pools and
fluxes (#566). This change implements the time-averaging functionality
which exists in the serial driver into the MPI master driver. This is
required to restore bitwise reproducibility in the CASA netcdf output
file across serial and MPI runs.
@SeanBryan51 SeanBryan51 force-pushed the fix-serial-mpi-non-reproducibility-for-casa-cnp branch from cf11792 to 6e609f4 Compare March 18, 2025 02:49
@SeanBryan51
Copy link
Collaborator Author

Initialisation of CASA types was added in #590. Need to rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant