Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix the decomposition issue associated with cimin in radiation surface physics #828

Merged
merged 13 commits into from
Sep 27, 2021

Conversation

junwang-noaa
Copy link
Collaborator

@junwang-noaa junwang-noaa commented Sep 24, 2021

PR Checklist

  • Ths PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • If new or updated input data is required by this PR, it is clearly stated in the text of the PR.

Instructions: All subsequent sections of text should be filled in as appropriate.

The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsiblity to keep the PR up-to-date with the develop branch of ufs-weather-model.

Description

This PR is to fix the decomposition issue associated with cimin in radiation surface physics, additional PRs on CA and ugwpv1 are required to fix the coupled test decomposition issue.

Issue(s) addressed

Link the issues to be closed with this PR, whether in this repository, or in another repository.
(Remember, issues must always be created before starting work on a PR branch!)

Testing

How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)

  • hera.intel
  • hera.gnu
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss_cray
  • wcoss_dell_p3
  • CI

Dependencies

fv3atm PR NOAA-EMC/fv3atm#397
ccpp PR NCAR/ccpp-physics#742

@climbfuji climbfuji changed the title point to fv3 branch fix the decomposition issue associated with cimin in radiation surface physics Sep 24, 2021
@junwang-noaa junwang-noaa added Baseline Updates Current baselines will be updated. Waiting for Reviews The PR is waiting for reviews from associated component PR's. labels Sep 25, 2021
@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: hera
Compiler: intel
Job: BL
Repo location: /scratch1/NCEPDEV/nems/emc.nemspara/autort/pr/742200664/20210925031514/ufs-weather-model
Please manually delete: /scratch1/NCEPDEV/stmp2/emc.nemspara/FV3_RT/rt_30408
Test hafs_regional_atm 072 failed failed
Test hafs_regional_atm 072 failed in run_test failed
Test hafs_regional_docn_oisst 075 failed failed
Test hafs_regional_docn_oisst 075 failed in run_test failed
Please make changes and add the following label back:
hera-intel-BL

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: hera
Compiler: intel
Job: RT
Repo location: /scratch1/NCEPDEV/nems/emc.nemspara/autort/pr/742200664/20210925213017/ufs-weather-model
Please manually delete: /scratch1/NCEPDEV/stmp2/emc.nemspara/FV3_RT/rt_7554
Test control_ugwpv1_debug 070 failed in check_result failed
Test control_ugwpv1_debug 070 failed in run_test failed
Test control_ras_debug 071 failed in check_result failed
Test control_ras_debug 071 failed in run_test failed
Test control_wrtGauss_netcdf_parallel_debug 061 failed in check_result failed
Test control_wrtGauss_netcdf_parallel_debug 061 failed in run_test failed
Test control_ca_debug 063 failed in check_result failed
Test control_ca_debug 063 failed in run_test failed
Test control_noahmp_debug 072 failed in check_result failed
Test control_noahmp_debug 072 failed in run_test failed
Test control_debug 058 failed in check_result failed
Test control_debug 058 failed in run_test failed
Test control_lndp_debug 064 failed in check_result failed
Test control_lndp_debug 064 failed in run_test failed
Test control_CubedSphereGrid_debug 060 failed in check_result failed
Test control_CubedSphereGrid_debug 060 failed in run_test failed
Test datm_control_cfsr 090 failed in check_result failed
Test datm_control_cfsr 090 failed in run_test failed
Test control_diag_debug 073 failed in check_result failed
Test control_diag_debug 073 failed in run_test failed
Test control_lheatstrg_debug 065 failed in check_result failed
Test control_lheatstrg_debug 065 failed in run_test failed
Test datm_debug_cfsr 098 failed in check_result failed
Test datm_debug_cfsr 098 failed in run_test failed
Test control_stochy_debug 062 failed in check_result failed
Test control_stochy_debug 062 failed in run_test failed
Test control_rrtmgp_debug 067 failed in check_result failed
Test control_rrtmgp_debug 067 failed in run_test failed
Test control_thompson_extdiag_debug 076 failed in check_result failed
Test control_thompson_extdiag_debug 076 failed in run_test failed
Test control_thompson_no_aero_debug 075 failed in check_result failed
Test control_thompson_no_aero_debug 075 failed in run_test failed
Test control_thompson_debug 074 failed in check_result failed
Test control_thompson_debug 074 failed in run_test failed
Test fv3_rrfs_v1alpha_debug 082 failed in check_result failed
Test fv3_rrfs_v1alpha_debug 082 failed in run_test failed
Test fv3_rrfs_v1beta_debug 081 failed in check_result failed
Test fv3_rrfs_v1beta_debug 081 failed in run_test failed
Test fv3_HAFS_v0_hwrf_thompson_debug 083 failed in check_result failed
Test fv3_HAFS_v0_hwrf_thompson_debug 083 failed in run_test failed
Test regional_control_debug 077 failed in check_result failed
Test regional_control_debug 077 failed in run_test failed
Test regional_quilt_debug 078 failed in check_result failed
Test regional_quilt_debug 078 failed in run_test failed
Test control_merra2_debug 066 failed in check_result failed
Test control_merra2_debug 066 failed in run_test failed
Test control_2threads_debug 059 failed in check_result failed
Test control_2threads_debug 059 failed in run_test failed
Test fv3_gsd_debug 079 failed in check_result failed
Test fv3_gsd_debug 079 failed in run_test failed
Test datm_bulk_cfsr 094 failed in check_result failed
Test datm_bulk_cfsr 094 failed in run_test failed
Test datm_control_gefs 092 failed in check_result failed
Test datm_control_gefs 092 failed in run_test failed
Test control_csawmgt_debug 069 failed in check_result failed
Test control_csawmgt_debug 069 failed in run_test failed
Test control_csawmg_debug 068 failed in check_result failed
Test control_csawmg_debug 068 failed in run_test failed
Test cpld_debug 016 failed in check_result failed
Test cpld_debug 016 failed in run_test failed
Test fv3_esg_HAFS_v0_hwrf_thompson_debug 084 failed in check_result failed
Test fv3_esg_HAFS_v0_hwrf_thompson_debug 084 failed in run_test failed
Test datm_control_iau_gefs 093 failed in check_result failed
Test datm_control_iau_gefs 093 failed in run_test failed
Test datm_bulk_gefs 095 failed in check_result failed
Test datm_bulk_gefs 095 failed in run_test failed
Test datm_cdeps_control_gefs 101 failed in check_result failed
Test datm_cdeps_control_gefs 101 failed in run_test failed
Test datm_cdeps_control_cfsr 099 failed in check_result failed
Test datm_cdeps_bulk_gefs 103 failed in check_result failed
Test datm_cdeps_control_cfsr 099 failed in run_test failed
Test datm_cdeps_bulk_gefs 103 failed in run_test failed
Test datm_cdeps_bulk_cfsr 102 failed in check_result failed
Test datm_cdeps_bulk_cfsr 102 failed in run_test failed
Test fv3_gsd_diag_debug 080 failed in check_result failed
Test fv3_gsd_diag_debug 080 failed in run_test failed
Test datm_cdeps_multiple_files_cfsr 106 failed in check_result failed
Test datm_cdeps_multiple_files_cfsr 106 failed in run_test failed
Test control_stochy 028 failed in check_result failed
Test control_stochy 028 failed in run_test failed
Test control_wrtGauss_netcdf_parallel 023 failed in check_result failed
Test control_wrtGauss_netcdf_parallel 023 failed in run_test failed
Test control_2threads 019 failed in check_result failed
Test control_2threads 019 failed in run_test failed
Test control_CubedSphereGrid 022 failed in check_result failed
Test control_CubedSphereGrid 022 failed in run_test failed
Test control_decomp 018 failed in check_result failed
Test control_decomp 018 failed in run_test failed
Test control 017 failed in check_result failed
Test control 017 failed in run_test failed
Test control_fhzero 021 failed in check_result failed
Test control_fhzero 021 failed in run_test failed
Test control_ca 030 failed in check_result failed
Test control_ca 030 failed in run_test failed
Test control_lndp 031 failed in check_result failed
Test control_lndp 031 failed in run_test failed
Test control_lseaspray 033 failed in check_result failed
Test control_lseaspray 033 failed in run_test failed
Test control_lheatstrg 032 failed in check_result failed
Test control_lheatstrg 032 failed in run_test failed
Test datm_mx025_gefs 097 failed in check_result failed
Test datm_mx025_gefs 097 failed in run_test failed
Test datm_mx025_cfsr 096 failed in check_result failed
Test datm_mx025_cfsr 096 failed in run_test failed
Test cpld_control 001 failed in check_result failed
Test cpld_control 001 failed in run_test failed
Test cpld_2threads 003 failed in check_result failed
Test cpld_2threads 003 failed in run_test failed
Test cpld_decomp 004 failed in check_result failed
Test cpld_decomp 004 failed in run_test failed
Test cpld_ca 005 failed in check_result failed
Test cpld_ca 005 failed in run_test failed
Test datm_cdeps_debug_cfsr 107 failed in check_result failed
Test datm_cdeps_debug_cfsr 107 failed in run_test failed
Test datm_cdeps_mx025_cfsr 104 failed in check_result failed
Test datm_cdeps_mx025_cfsr 104 failed in run_test failed
Test datm_cdeps_mx025_gefs 105 failed in check_result failed
Test datm_cdeps_mx025_gefs 105 failed in run_test failed
Test control_c48 024 failed in check_result failed
Test control_c48 024 failed in run_test failed
Test fv3_rrfs_v1alpha 043 failed in check_result failed
Test fv3_rrfs_v1alpha 043 failed in run_test failed
Test fv3_rrfs_v1beta 046 failed in check_result failed
Test fv3_rrfs_v1beta 046 failed in run_test failed
Test fv3_hrrr 045 failed in check_result failed
Test fv3_hrrr 045 failed in run_test failed
Test cpld_control_wave 015 failed in check_result failed
Test cpld_control_wave 015 failed in run_test failed
Test control_merra2 034 failed in check_result failed
Test control_merra2 034 failed in run_test failed
Test fv3_rap 044 failed in check_result failed
Test fv3_rap 044 failed in run_test failed
Test control_c192 025 failed in check_result failed
Test control_c192 025 failed in run_test failed
Test regional_quilt_2threads 038 failed in check_result failed
Test regional_quilt_2threads 038 failed in run_test failed
Test regional_control 035 failed in check_result failed
Test regional_control 035 failed in run_test failed
Test regional_quilt 037 failed in check_result failed
Test regional_quilt 037 failed in run_test failed
Test fv3_gsd 042 failed in check_result failed
Test fv3_gsd 042 failed in run_test failed
Test regional_quilt_netcdf_parallel 040 failed in check_result failed
Test regional_quilt_netcdf_parallel 040 failed in run_test failed
Test control_rrtmgp 047 failed in check_result failed
Test control_rrtmgp 047 failed in run_test failed
Test control_flake 050 failed in check_result failed
Test control_flake 050 failed in run_test failed
Test regional_quilt_hafs 039 failed in check_result failed
Test regional_quilt_hafs 039 failed in run_test failed
Test cpld_control_c192 006 failed in check_result failed
Test cpld_control_c192 006 failed in run_test failed
Test control_ugwpv1 051 failed in check_result failed
Test control_ugwpv1 051 failed in run_test failed
Test control_csawmg 048 failed in check_result failed
Test control_csawmg 048 failed in run_test failed
Test control_ras 052 failed in check_result failed
Test control_ras 052 failed in run_test failed
Test control_thompson 053 failed in check_result failed
Test control_thompson 053 failed in run_test failed
Test control_noahmp 055 failed in check_result failed
Test control_noahmp 055 failed in run_test failed
Test fv3_HAFS_v0_hwrf_thompson 056 failed in check_result failed
Test fv3_HAFS_v0_hwrf_thompson 056 failed in run_test failed
Test control_thompson_no_aero 054 failed in check_result failed
Test control_thompson_no_aero 054 failed in run_test failed
Test regional_quilt_RRTMGP 041 failed in check_result failed
Test regional_quilt_RRTMGP 041 failed in run_test failed
Test control_c384gdas 027 failed in check_result failed
Test control_c384gdas 027 failed in run_test failed
Test fv3_esg_HAFS_v0_hwrf_thompson 057 failed in check_result failed
Test fv3_esg_HAFS_v0_hwrf_thompson 057 failed in run_test failed
Test control_c384 026 failed in check_result failed
Test control_c384 026 failed in run_test failed
Test hafs_regional_atm 085 failed in check_result failed
Test hafs_regional_atm 085 failed in run_test failed
Test control_atm_aerosols 110 failed in check_result failed
Test control_atm_aerosols 110 failed in run_test failed
Test cpld_bmark_v16 010 failed in check_result failed
Test cpld_bmark_v16 010 failed in run_test failed
Test cpld_bmark_v16_nsst 012 failed in check_result failed
Test cpld_bmark_v16_nsst 012 failed in run_test failed
Test hafs_regional_atm_ocn 086 failed in check_result failed
Test hafs_regional_atm_ocn 086 failed in run_test failed
Test cpld_control_c384 008 failed in check_result failed
Test cpld_control_c384 008 failed in run_test failed
Test cpld_bmark_wave_v16 013 failed in check_result failed
Test cpld_bmark_wave_v16 013 failed in run_test failed
Test cpld_bmark_wave_v16_p7b 014 failed in check_result failed
Test cpld_bmark_wave_v16_p7b 014 failed in run_test failed
Test hafs_regional_docn 087 failed in check_result failed
Test hafs_regional_docn 087 failed in run_test failed
Test hafs_regional_docn_oisst 088 failed in check_result failed
Test hafs_regional_docn_oisst 088 failed in run_test failed
Test control_atmwav 108 failed in check_result failed
Test control_atmwav 108 failed in run_test failed
Test hafs_regional_datm_cdeps 089 failed in check_result failed
Test hafs_regional_datm_cdeps 089 failed in run_test failed
Test control_csawmgt 049 failed failed
Test control_csawmgt 049 failed in run_test failed
Test control_c384gdas_wav 109 failed in check_result failed
Test control_c384gdas_wav 109 failed in run_test failed
Please make changes and add the following label back:
hera-intel-RT

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: orion
Compiler: intel
Job: BL
Repo location: /work/noaa/nems/emc.nemspara/autort/pr/742200664/20210925163013/ufs-weather-model
Please manually delete: /work/noaa/stmp/bcurtis/stmp/bcurtis/FV3_RT/rt_175200
Please make changes and add the following label back:
orion-intel-BL

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: jet
Compiler: intel
Job: BL
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/742200664/20210925213009/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_73559
Test hafs_regional_datm_cdeps 076 failed failed
Test hafs_regional_datm_cdeps 076 failed in run_test failed
Please make changes and add the following label back:
jet-intel-BL

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: hera
Compiler: intel
Job: RT
Repo location: /scratch1/NCEPDEV/nems/emc.nemspara/autort/pr/742200664/20210926011508/ufs-weather-model
Please manually delete: /scratch1/NCEPDEV/stmp2/emc.nemspara/FV3_RT/rt_8363
Test cpld_bmark_wave_v16_p7b 014 failed failed
Test cpld_bmark_wave_v16_p7b 014 failed in run_test failed
Test cpld_bmark_wave_v16 013 failed failed
Test cpld_bmark_wave_v16 013 failed in run_test failed
Please make changes and add the following label back:
hera-intel-RT

@junwang-noaa
Copy link
Collaborator Author

Automated RT Failure Notification
Machine: jet
Compiler: intel
Job: BL
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/742200664/20210925213009/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_73559
Test hafs_regional_datm_cdeps 076 failed failed
Test hafs_regional_datm_cdeps 076 failed in run_test failed
Please make changes and add the following label back:
jet-intel-BL

error message:

  • srun --label -n 120 ./fv3.exe
    srun: Job 59119359 step creation temporarily disabled, retrying (Socket timed out on send/recv operation)
    srun: Job 59119359 step creation still disabled, retrying (Requested nodes are busy)
    srun: Job 59119359 step creation still disabled, retrying (Requested nodes are busy)
    srun: Job 59119359 step creation still disabled, retrying (Requested nodes are busy)
    srun: Job 59119359 step creation still disabled, retrying (Requested nodes are busy)

Will resubmit this test

@junwang-noaa
Copy link
Collaborator Author

On hera intel, the baseline name is not correct, rerun RT after correcting it.
On orion, the baseline was created, but can't be copied to baseline directory under NEMSfv3gfs

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: jet
Compiler: intel
Job: RT
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/742200664/20210926184507/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_109721
Test cpld_control_c192 005 failed in check_result failed
Test cpld_control_c192 005 failed in run_test failed
Test cpld_control 001 failed in check_result failed
Test cpld_control 001 failed in run_test failed
Test cpld_ca 004 failed in check_result failed
Test cpld_ca 004 failed in run_test failed
Test cpld_control_c384 007 failed in check_result failed
Test cpld_control_c384 007 failed in run_test failed
Test cpld_2threads 003 failed in check_result failed
Test cpld_2threads 003 failed in run_test failed
Test cpld_bmark_v16 009 failed in check_result failed
Test cpld_bmark_v16 009 failed in run_test failed
Test cpld_bmark_v16_nsst 011 failed in check_result failed
Test cpld_bmark_v16_nsst 011 failed in run_test failed
Please make changes and add the following label back:
jet-intel-RT

@climbfuji
Copy link
Collaborator

@junwang-noaa we are still waiting for the WCOSS logs, correct?

@junwang-noaa
Copy link
Collaborator Author

junwang-noaa commented Sep 27, 2021 via email

Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, and fv3atm hash is correct.

@junwang-noaa junwang-noaa merged commit 5530c28 into ufs-community:develop Sep 27, 2021
@junwang-noaa junwang-noaa deleted the decomp_fix branch June 7, 2022 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. Waiting for Reviews The PR is waiting for reviews from associated component PR's.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants