-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Fixes for PW Jenkins Nightly Builds #1091
[develop] Fixes for PW Jenkins Nightly Builds #1091
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes will deactivate the conda environment on GCP and use noaacloud
rather than az/g/pclusternoaa
for the machine yaml file (which is important, since there are no az/g/pclusternoaa
machine yaml files, but there is a noaacloud
machine yaml file). The WE2E coverage tests were run on Orion and all tests successfully passed:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_SF_1p1km_20240603144816 COMPLETE 569.98
deactivate_tasks_20240603144817 COMPLETE 1.11
get_from_AWS_ics_GEFS_lbcs_GEFS_fmt_grib2_2022040400_ensemble_2me COMPLETE 1895.96
grid_CONUS_3km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_ COMPLETE 1045.73
grid_RRFS_AK_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20240 COMPLETE 369.00
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta_202406031 COMPLETE 17.16
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240603144 COMPLETE 906.47
grid_RRFS_CONUScompact_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_ COMPLETE 87.03
grid_RRFS_CONUScompact_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_2 COMPLETE 732.47
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0_202406 COMPLETE 59.98
2020_CAD_20240603144826 COMPLETE 68.13
----------------------------------------------------------------------------------------------------
Total COMPLETE 5753.02
Including the SRW metric test:
Skill Score: 0.99807
+ [[ 0.99807 < 0.700 ]]
+ echo 'Congrats! You pass check!'
Congrats! You pass check!
Approving PR now.
This PR passed on AWS using the Jenkins nightly job. |
When I ran comprehensive tests on Hera, I got one failing test:
I tried to rerun test several times and it always failed in forecast between hours 05 and 06. But, when I ran that single test:
@MichaelLueken @EdwardSnyder-NOAA do you have an idea what is going on here? |
I have gone ahead and added the |
With the merging of @RatkoVasic-NOAA's PR #1093, the SRW App is now compiling and running without issue on Hera GNU. Are there additional changes that are still required for this PR, or is it safe to remove the |
Finally got a stable connection to run the nightly build on Azure using this PR, which resulted in a successfully run. Removing the do not merge tag, as it is ready for review. |
@EdwardSnyder-NOAA - These changes still look good to me! I have also successfully ran the coverage WE2E tests on Hera Intel:
and the
I will go ahead and kick off the automated Jenkins tests for this work now. @RatkoVasic-NOAA - Since you have also approved this PR previously, if you see anything that you would like to have changed, please note these in the PR so that the work isn't merged while you have concerns. Thanks! |
No concerns, you can go ahead with merge. |
All Jenkins tests passed successfully, with the exception of the |
The
Merging this PR now. |
* Adds logic to handle GCP's default conda env, which conflicts with the SRW App's conda env. Fixes a Parallel Works naming convention bug in the script. * It also addresses a known issue with a Ruby warning on PW instances that prevents the run_WE2E_tests.py from exiting gracefully. The solution we use in our bootstrap for /contrib doesn't seem to work for the /lustre directory, which is why the warning is hardcoded into the monitor_jobs.py script. * The new spack-stack build on Azure is missing a gnu library, so added the path to this missing library to the proper run scripts and cleaned up the wflow noaacloud lua file. * Removed log and error files from the qsub wrapper script so that qsub can generate these files with the job id in the files name. Also, fixed typo in the wrapper script.
DESCRIPTION OF CHANGES:
This PR adds logic to handle GCP's default conda env, which conflicts with the SRW App's conda env. Fixes a Parallel Works naming convention bug in the script.
It also addresses a known issue with a Ruby warning on PW instances that prevents the
run_WE2E_tests.py
from exiting gracefully. The solution we use in our bootstrap for/contrib
doesn't seem to work for the/lustre
directory, which is why the warning is hardcoded into themonitor_jobs.py
script.The new spack-stack build on Azure is missing a gnu library, so added the path to this missing library to the proper run scripts and cleaned up the wflow noaacloud lua file.
Removed log and error files from the qsub wrapper script so that qsub can generate these files with the job id in the files name. Also, fixed typo in the wrapper script.
Type of change
TESTS CONDUCTED:
DEPENDENCIES:
DOCUMENTATION:
None.
ISSUE:
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):
@kbooker79, @BruceKropp-Raytheon