-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Common infrastructure for CIs and manual runs #378
[develop] Common infrastructure for CIs and manual runs #378
Conversation
3965b56
to
67e53f0
Compare
67e53f0
to
deb1181
Compare
Machine: hera |
Machine: jet |
deb1181
to
7ce325e
Compare
Machine: hera |
Machine: jet |
70d5476
to
c2683f7
Compare
c2683f7
to
43fa68e
Compare
* Update tar command to create cleaner artifact * Update path of build log for s3 upload
4ece074
to
057126e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielabdi-noaa One quick question with the modification in ush/launch_FV3LAM_wflow.sh
- the comment states that this change is a hack for Gaea, but the logic is for Cheyenne, with a Cheyenne directory. Should the comment read:
Hack for Cheyenne
instead?
@MichaelLueken I think this PR is now ready to go in. All the fundamental tests completed successfully on Cheyenne with both gnu and intel. All other systems (except Hera) have green too on fundamental tests. Hera is unusually slow today and I've just checked the jobs are queued. It will probably finish with green by tomorrow. I will push the commit fixing some comments so here is the Jenkins pipeline for future reference. I've put it under the description of the PR as well Edit: All machines have finished successfully with fundamental tests now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for working through the issues on the tier-1 machines, @danielabdi-noaa and @jessemcfarland! I have submitted the fundamental WE2E tests on Hera and they all passed successfully. Before submitting my approval, I see this morning's Jenkins run has failed - https://jenkins-epic.woc.noaa.gov/blue/organizations/jenkins/ufs-srweather-app%2Fpipeline/detail/PR-378/28/pipeline. Will this need to be addressed before approval and merging, or is this an issue with either GitHub or Jenkins?
@MichaelLueken Yes, it was a github issue. There is an ongoing re-run that should finish in an hour or so. |
Only remove the data directories to allow we2e cron jobs to complete and clean up themselves correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really nice addition, @danielabdi-noaa. Thanks!
Just one possible correction below.
# Array of all optional rrfs_utl executables built | ||
#----------------------------------------------------------------------- | ||
executables_created=( adjust_soiltq.exe \ | ||
check_imssnow_fv3lam.exe \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this one also be a +=
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just fixed the bug!
@@ -1040,7 +1040,7 @@ def generate_FV3LAM_wflow(): | |||
following line can be added to the user's crontab (use \"crontab -e\" to | |||
edit the cron table): | |||
|
|||
*/3 * * * * cd {EXPTDIR} && ./launch_FV3LAM_wflow.sh called_from_cron=\"TRUE\" | |||
*/{CRON_RELAUNCH_INTVL_MNTS} * * * * cd {EXPTDIR} && ./launch_FV3LAM_wflow.sh called_from_cron=\"TRUE\" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NICE!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rerun has of the Jenkins pipeline has successfully completed. I will now give my approval to these changes.
DESCRIPTION OF CHANGES:
This PR addresses the issue described in detail in issue #377 . Basically, it makes both CIs ( Jenkins and Github actions ) use the same build and test infrastructure, especially simplifying Jenkins a lot.
Fundamental test cases ( 9 cases used in Github actions CI) are all green[1] on all platform. This includes test besides build. Here is the jenkins pipeline for future reference. I think this PR is high priority because of this.
[1] Although Cheyenne tests are red, all tests completed successfully on the system. There is some weird issue with tarring the results that does not happen on other systems.
Update: Cheyenne issues are fixed and here is the latest jenkins pipeline result that succeeded on all platforms including Cheyenne.
Detailed description changes:
test/build.sh
tests/WE2E/setup_WE2E_tests.sh
fundamental
,comprehensive
or any other name such ascustom
. To create a configuration for a specific machine, for example in case of a test case that can not be successfully run on that machine, create config files of the formfundamental.jet
etctests/WE2E/get_expts_status.sh
With these changes, Jenkins source code is simplified and more robust, because code is not duplicated and uses existing
mature scripts for build and test.
Moreover, bug fixes for Jenkins tests include:
machine/gaea.sh
).build_cheyenne_intel/gnu
and thenwflow_cheyenne
did not unload python.set +u
needed for SLURM_JOB_ID checklibpng
version not found (disabled for now, will come back after issues are resolved)I have left the selection of comprehensive test cases to get a green on all machines for a future PR. Jet is especially difficult since a test case may succeed or fail randomly on it for unknown reasons.
Type of change
TESTS CONDUCTED:
DEPENDENCIES:
None
DOCUMENTATION:
None
ISSUE:
#377
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):
@jessemcfarland @christinaholtNOAA