Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all cpld baselines are failing on derecho #2501

Closed
DeniseWorthen opened this issue Nov 18, 2024 · 15 comments
Closed

all cpld baselines are failing on derecho #2501

DeniseWorthen opened this issue Nov 18, 2024 · 15 comments
Labels
bug Something isn't working

Comments

@DeniseWorthen
Copy link
Collaborator

Description

Running UFS cpld tests from top-develop on derecho are all failing. There is a baseline directory in place (develop-20241112) but Derecho was skipped for the last PR (the WW3 PIO). A note was left here that baselines were created OK, but apparently not.

To Reproduce:

Run top-develop on Derecho

@DeniseWorthen DeniseWorthen added the bug Something isn't working label Nov 18, 2024
@DeniseWorthen
Copy link
Collaborator Author

@jkbk2004 This is still an issue as of today, running top-of-develop against develop-20241121. How were the baselines generated on Derecho?

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 3, 2024

@DeniseWorthen Rocoto is at least functional on Derecho. Current baseline of the develop branch is with /glade/derecho/scratch/epicufsrt/ufs-weather-model/RT/NEMSfv3gfs/develop-20241127. Can you test with develop-20241127?

@DeniseWorthen
Copy link
Collaborator Author

I'm running ecflow fine. And the baselines are not comparing.

@DeniseWorthen
Copy link
Collaborator Author

The point of maintaining 2 months worth of baselines is that a developer can check out an older hash and run against it.

I'm testing 144ccb0. That baseline date is 20241121. The baseline exists on Derecho

ls -lrt  /glade/derecho/scratch/epicufsrt/ufs-weather-model/RT//NEMSfv3gfs

....
drwxr-sr-x 142 fandrade  ncar 16384 Oct 21 14:44 develop-20241011
drwxr-sr-x  49 epicufsrt ncar  4096 Nov  8 11:38 input-data-20240501
drwxrwxr-x 145 epicufsrt ncar 16384 Nov 11 12:44 develop-20241031
drwxr-sr-x 144 epicufsrt ncar 16384 Nov 16 17:46 develop-20241112
drwxr-sr-x   3 epicufsrt ncar  4096 Nov 16 22:21 BM_IC-20220207
drwxr-sr-x 144 epicufsrt ncar 16384 Nov 21 10:05 develop-20241119
drwxr-sr-x 146 epicufsrt ncar 16384 Nov 26 14:11 develop-20241121
drwxr-sr-x 144 epicufsrt ncar 16384 Dec  1 07:44 develop-20241127
drwxr-sr-x 144 epicufsrt ncar 16384 Dec  4 06:51 develop-20241203

However, no logs were posted for that commit at UWM:
Screenshot 2024-12-04 at 12 41 48 PM

Why does a baseline exist if it the commit was not run against or created by that commit?

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 4, 2024

Workflow managers on Derecho wasn't stable for a while. We tried to recover the baselines sporadically. We started maintaining the RT log from last commit.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 4, 2024

I have no issue to test with develop branch for 20241127

@DeniseWorthen
Copy link
Collaborator Author

To reiterate, a developer should expect that checking out a hash and running against a baseline will pass. Why is the baseline present if it was not generated by or tested against that hash?

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 4, 2024

let us know if you have any issue with develop branch

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Dec 4, 2024

No! Why else maintain baselines if a developer cannot run against them. This is a fundamental principle.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 4, 2024

The system wasn't stable for a while

@DeniseWorthen
Copy link
Collaborator Author

That makes no sense. Was the hash used to generate the associated baseline? Yes or no.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 4, 2024

I reported Derecho baseline is fully recovered ok with full test log from 20241127. System issue before then. I am removing baselines created during the time period with workflow issue.

@gspetro-NOAA
Copy link
Collaborator

I am sorting through GH Issues, and @jkbk2004 has said this issue can be closed since the develop branch is ok with the current baseline. @DeniseWorthen Was there anything else you needed from us on this one?

@DeniseWorthen
Copy link
Collaborator Author

I have not tested on Derecho after the baselines were sorted out. Neither have I checked that the ones that shouldn't be there are not there.

@gspetro-NOAA
Copy link
Collaborator

@jkbk2004 deleted the problematic baselines approximately 3 weeks ago, and I have verified that only develop-20241127 remains from the November baselines on Derecho. In the future, we will address baseline-related issues more quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

3 participants