Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes for coupled model P7 restart tests and Update build system and regression testing on Acorn (PR#809) #819

Merged
merged 30 commits into from
Sep 24, 2021

Conversation

junwang-noaa
Copy link
Collaborator

@junwang-noaa junwang-noaa commented Sep 21, 2021

PR Checklist

  • Ths PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • If new or updated input data is required by this PR, it is clearly stated in the text of the PR.

Instructions: All subsequent sections of text should be filled in as appropriate.

The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsiblity to keep the PR up-to-date with the develop branch of ufs-weather-model.

Description

This PR contains the NoahMP and CA fixes for coupled tests restart reproducibility. It also adds a standalone FV3 P7 CCPP suite file. It does not change the current regression test results.

Issue(s) addressed

Link the issues to be closed with this PR, whether in this repository, or in another repository.
(Remember, issues must always be created before starting work on a PR branch!)

  • fixes issue in discussion #797

Testing

How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)

  • hera.intel
  • hera.gnu
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss_cray
  • wcoss_dell_p3
  • CI

Dependencies

If testing this branch requires non-default branches in other repositories, list them. Those branches should have matching names (ideally).

@junwang-noaa junwang-noaa changed the title update FV3 and stochy physics branch fixes for coupled model P7 restart tests Sep 21, 2021
@junwang-noaa junwang-noaa added No Baseline Change No Baseline Change Waiting for Reviews The PR is waiting for reviews from associated component PR's. labels Sep 21, 2021
@github-actions
Copy link

@junwang-noaa please bring these up to date with respective authoritative repositories

  • ufs-weather-model NOT up to date
  • mom6 NOT up to date

1 similar comment
@github-actions
Copy link

@junwang-noaa please bring these up to date with respective authoritative repositories

  • ufs-weather-model NOT up to date
  • mom6 NOT up to date

@junwang-noaa
Copy link
Collaborator Author

Automated RT Failure Notification
Machine: herav
Compiler: intel
Job: RT
Repo location: /scratch1/NCEPDEV/nems/emc.nemspara/autort/pr/739232976/20210923180008/ufs-weather-model
Please manually delete: /scratch1/NCEPDEV/stmp2/emc.nemspara/FV3_RT/rt_32526
Test cpld_control_wave 015 failed failed
Test cpld_control_wave 015 failed in run_test failed
Please make changes and add the following label back:
hera-intel-RT

The error message from cpld_control_wave, the job did not start, will rerun this job

  • srun --label -n 400 ./fv3.exe
    srun: error: slurm_receive_msgs: Socket timed out on send/recv operation
    srun: error: Task launch for StepId=23466032.0 failed on node h17c43: Socket timed out on send/recv operation
    srun: error: Application launch failed: Socket timed out on send/recv operation
    srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
    0: slurmstepd: error: *** STEP 23466032.0 ON h3c54 CANCELLED AT 2021-09-23T18:17:23 ***
    srun: error: h3c54: tasks 0-39: Killed
    srun: launch/slurm: _step_signal: Terminating StepId=23466032.0
    srun: error: h4c50: tasks 40-79: Killed

@junwang-noaa junwang-noaa changed the title fixes for coupled model P7 restart tests fixes for coupled model P7 restart tests and Update build system and regression testing on Acorn (PR#809) Sep 23, 2021
@climbfuji
Copy link
Collaborator

Yay, gaea is back like in the good old days. The runtimes are reasonable again.

from __future__ import print_function
import ecflow
import ecflow as ecflow
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting some errors on Acorn, maybe something was wrong with ecflow installation. They keep changing things, I'll test again and remove it if it's not necessary anymore.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like they installed the latest version of ecflow the same day, or day before I made this change:

$ ls -dl /apps/ops/prod/nco/core/ecflow.v5.6.0.*
drwxr-xr-x 8 ops.prod prod 87 Apr 14 00:08 /apps/ops/prod/nco/core/ecflow.v5.6.0.3
drwxr-xr-x 8 ops.prod prod 87 Aug  5 16:15 /apps/ops/prod/nco/core/ecflow.v5.6.0.4
drwxr-xr-x 8 ops.prod prod 87 Aug 17 15:39 /apps/ops/prod/nco/core/ecflow.v5.6.0.5
drwxr-xr-x 8 ops.prod prod 87 Sep  2 19:58 /apps/ops/prod/nco/core/ecflow.v5.6.0.6

Who knows what the differences are between 5.6.0.5 and 5.6.0.6.

@junwang-noaa junwang-noaa merged commit f4da764 into ufs-community:develop Sep 24, 2021
@junwang-noaa junwang-noaa deleted the rstfix branch June 7, 2022 01:30
epic-cicd-jenkins pushed a commit that referenced this pull request Apr 17, 2023
…or stochastic physics. (#819)

* Add path_to_defns argument to set_FV3nml_ens_stoch_seeds.

* Keep hour in directory name of ensemble members.

* Fix SPP/T flag

* Fix indentation of error/info message containing settings

* Bugfix for DO_SHUM/SKEB which came from shell workflow generation

* Fix nam_spp/p/erts confusion.

* Make create_diag_table info/error message more informative.

* Minor fix in info message.

* Add a WE2E test case for testing stochastic physics.

* Modify stoch phys test case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
No Baseline Change No Baseline Change Waiting for Reviews The PR is waiting for reviews from associated component PR's.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants