Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addresses Carsten review #51

Conversation

Thomas-Ulrich
Copy link
Collaborator

This should address Carsten's review on #35.
It compiles, but would also need to be tested (which is the most lightweight setup I could use to test?).
(The next step would be to apply clang-fornat and merge master. I prefer to propose this incremental PR without these latter changes, to have a cleaner diff).

@Thomas-Ulrich Thomas-Ulrich force-pushed the thomas/dmay/seas-checkpoint-greenfunc_address_review_only branch 2 times, most recently from fcfc95f to 75157e8 Compare September 14, 2023 14:01
@Thomas-Ulrich Thomas-Ulrich force-pushed the thomas/dmay/seas-checkpoint-greenfunc_address_review_only branch from 75157e8 to 49d7a9f Compare September 14, 2023 14:03
@hpc4geo
Copy link
Collaborator

hpc4geo commented Sep 14, 2023

Feature wise, the most important features which PR35 supports and must continue to work are the following modes:

  1. Generating the GFs on N MPI ranks and immediately running the time-integrator in a single tandem execution. This is the default behavior users want when running 2D problems on a local machine without a queuing system / fixed walltime.
  2. Incrementally generating the GFs on a fixed number of MPI ranks over multiple tandem executions. This is the use case required for large 3D jobs and or machines with a queue / fixed walltime.
  3. Loading a complete set of GFs on the same number of MPI ranks as was used to write the GFs.
  4. Generating the GFs on one MPI communicator size (N), and loading the GFS on a different communicator size M. This is the use case required for large 3D jobs and or machines with a queue / fixed walltime. The code should function independent of whether M = N, M < N or M > N.

Proposed test

[mode 0]

  • Generate all GFS on 1 rank followed by time integration. Suggest using a small 2D problem (BP1). Check consistency of the solution by comparing the time-slip history with those obtained by NOT using the GF approach.
  • Generate all GFS on 48 rank followed by time integration. Suggest using a small 2D problem (BP1). Check consistency of the solution by comparing the time-slip history with those obtained by NOT using the GF approach.

[mode 1]

  • Make a higher resolution 2D mesh for BP1. We want a mesh such that the time required to compute each GF is say ~ 0.1 sec.
  • Compute the GFs on 48 MPI ranks. After a say 10 GFs are computed, "control-C" (i.e. force abort) of the job. Restart the tandem executable again using 48 MPI ranks. Check that the GF calculation picks up and starts computing the correct GF. Interrupt (brutally) the GF calculation a few more times. Repeat until all GFs are computed.

[mode 2]

  • Check you can load and perform time integration using the checkpointed GFs from the procedure listed above under [mode 1] using 48 MPI ranks. Check consistency of the solution by comparing the time-slip history with those obtained by NOT using the GF approach.

[mode 3]

  • Perform time integration using the checkpointed GFs from the procedure listed above under [mode 1] using 96 MPI ranks. Check consistency of the solution by comparing the time-slip history with those obtained by NOT using the GF approach.
  • Perform time integration using the checkpointed GFs from the procedure listed above under [mode 1] using 48 MPI ranks. Check consistency of the solution by comparing the time-slip history with those obtained by NOT using the GF approach.
  • Perform time integration using the checkpointed GFs from the procedure listed above under [mode 1] using 12 MPI ranks. Check consistency of the solution by comparing the time-slip history with those obtained by NOT using the GF approach.
  • Perform time integration using the checkpointed GFs from the procedure listed above under [mode 1] using 1 MPI rank. Check consistency of the solution by comparing the time-slip history with those obtained by NOT using the GF approach.

@hpc4geo
Copy link
Collaborator

hpc4geo commented Sep 14, 2023

Additionally, before any merge of these GF checkpoint branches we should resolve Issue #42

@Thomas-Ulrich Thomas-Ulrich marked this pull request as ready for review September 15, 2023 12:48
@Thomas-Ulrich
Copy link
Collaborator Author

Thomas-Ulrich commented Sep 15, 2023

So I've done all these tests on bp1_sum (final_time = 946080000 = 30 years), with tandem 2d POLYNOMIAL_DEGREE=2, and they pass.
I compare fault receivers using the python script compare_receivers_tandem.py.txt .
(inspired from a script we use for seissol).
test_green_function.sh.txt is the bash script used to test and out.log the log of the tests.

@Thomas-Ulrich Thomas-Ulrich merged commit 7edced9 into TEAR-ERC:dmay/seas-checkpoint-greenfunc Sep 15, 2023
@Thomas-Ulrich Thomas-Ulrich deleted the thomas/dmay/seas-checkpoint-greenfunc_address_review_only branch September 15, 2023 13:06
@Thomas-Ulrich Thomas-Ulrich mentioned this pull request Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants