-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out why PRE.*.ADESP tests run forever on desktops #1112
Comments
I'm not sure I have an appropriate machine on which to debug this as it doesn't get stuck for me (run time on Hobart is 48 seconds). |
There was a tar file posted for you on Slack. |
@gold2718 what rob said. I ran for an hour on melvin with 64 cores. Two case directories were produced in my acme scratch area which I tarred up for you to look at. |
Is slack really part of our CIME development system? github is the (public) site for posting issues while Slack (in addition to taking up too much time for me to monitor) is private leading to an incomplete a scattered record of any issue. |
In this particular case, Slack provides an easy way to share a binary (tar) file. Can't do that here or in gist. I guess dropbox could be used instead |
If I had to choose between Slack and dropbox, I suppose Slack is better (dropbox is so full of security holes, I refuse to use it out of self defense). |
Now can anyone tell me how to find this tar file? |
Okay, I eventually found the file by scrolling back through a large number of messages. Is this really the best we can do? I got no notification and there is no organization. Even an FTP to Yellowstone (or any machine where we both have accounts) with a followup email would be better. |
The second run of the PRE test is testing the pause/resume functionality using the coupler. The run on Melvin quickly writes the first restart file (at model time 3600s) but then seems to hang. I would expect the DESP component to run but do not see any indication of this. Before I expect ESP log output, it finds and reads the rpointer.drv file and checks to make sure the restart file in there exists. Is there anything weird about these simple filesystem statements in |
BTW, I tried this test on Yellowstone with CIME_MODEL=acme and it crashes as soon as the first run starts. Is this a known issue? The test was:
|
When you test using CIME_MODEL=acme on caldera (yellowstone) its set up to run without a batch system so you need to do
|
@jgfouca , I can't seem to reproduce that sort of behavior around here. Is there any way you could add print (or logging) statements to the routines described above? |
@jgfouca how does this test behave on compute001? That's a machine you could both be on. |
@rljacob trying it now |
Great, @jgfouca what is this platform and how do I get on it? |
Its a workstation at Argonne. I sent Steve an email. |
@gold2718 any luck getting this problem to reproduce on compute001? |
I haven't had a chance to try it out yet (did get an account though). |
I'm having trouble reproducing this on compute001 with the current ESMCI/master. If I try: |
@gold2718 , let me try on melvin. It's possible someone inadvertently fixed this problem. |
To see if it used to hang, I tried:
Got a run time of 276. |
@gold2718 it worked for me on melvin. Go ahead and re-add this test to cime_developer. |
@jgfouca, thanks I will do that with my next round of pause/resume upgrades (and will run tests on compute001 as part of my suite). |
In cam5_4_91 tag, a bug was fixed in mo_strato_rates.F90 regarding gamma terms. In the current model,the gamma terms are multiplied together but they needed to be added. This change should not affect current F compsets. [BFB] - Bit-For-Bit
In cam5_4_91 tag, a bug was fixed in mo_strato_rates.F90 regarding gamma terms. In the current model,the gamma terms are multiplied together but they needed to be added. This change should not affect current F compsets. [BFB] - Bit-For-Bit
Once the issue is resolved, re-enable this test in cime_developer suite.
The text was updated successfully, but these errors were encountered: