-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what ocean and sea ice initialization files should we be using? #1580
Comments
@jayeshkrishna is PIO somehow mis-identifying the file? I didn't think "pnetcdf" was a file format. There is cdf-2 or cdf-5, etc. |
Repeating relevant comments from issue #1547: In the PIO funtion open_nf (in file ionf_mod.F90) is logic to try to open a file using a given PIO type. If this fails, then the PIO type is changed. For oRRS18to6v3.170111.nc, the PIO file type starts as '5' (== PIO_iotype_pnetcdf). However, the error return from ierr = nfmpi_open(File%iosystem%IO_comm,fname,amode,File%iosystem%info,File%fh) is -51, and the PIO type is then changed to '8' (== pio_iotype_netcdf4p). ... PARAMETER (NF_ENOTNC = -51) ! Not a netcdf file (but this file can then be opened and read with pio_iotype_netcdf4p option). |
Thanks Pat. Just adding a note:
which seem to allow faster init times (at least for OCN), then there is another filename entry in one of those files that must be changed for consistency and potentially avoiding errors. |
Thanks for the update. The coupled group should also weigh in, and hopefully change the compset definitions. Faster initialization (to this degree) is also important for production runs. |
I 100% agree that we should be using pnetcdf files rather than netcdf because the performance impact of this choice is huge. I also agree that we need to make sure the new files we have are producing credible answers. Thanks for bringing this issue up, @worleyph and @ndkeen ! I don't know anything about which files would be correct vs wrong. @jonbob , @vanroekel , @mark-petersen - Could you talk amongst yourselves and assign someone to sort this out (as a "critical path" task? |
I might be wrong here .... but I think that ocean/ice team found that the choice of pnetcdf / netcdf depended on file size. @jonbob or @mark-petersen will be able to provide details. |
@toddringler is correct -- we've found that having large arrays in the IO requires us to use "pnetcdf,cdf5" for several of the ocn/ice streams. On some really small files, however, pnetcdf can be slower -- so we've moved to assigning pio types to individual files/streams. @PeterCaldwell , we have an issue to track this (#1451 that's been assigned to me -- so I'll also take it on as a "critical path" task. |
@jonbob and @toddringler , just to be clear, the problem is that the default ocean 18to6 inititalization file for the G and A_WCYCL compsets can not be read as a pnetcdf file (even using pnetcdf,cdf5) , as described in the first comment. An alternative file provide by @mark-petersen does work, but also generate many warnings when the model starts. As such, neither are appropriate for use in production runs, and something else is needed. Based on the above comment by @ndkeen, the alternative file can be made to work with some other changes in the streams.ocean file. Once this is confirmed, the coupled group should consider changing the definition of the relevant compsets. |
Correction - think that @vanroekel provided the alternative ocean (and sea ice) files - sorry for the misattribution Luke. |
so a quick note of clarity here. There is no such thing as a pnetcdf file. The issue is how PIO interfaces with the netcdf file. The main issue here is whether the file is cdf-2 or cdf5. What was happening in @worleyph and @ndkeen runs was due to the ocean init file being a cdf-2 type, but with some variables that violate the cdf-2 variable size constraint. This was not an issue when using parallel-netcdf/1.5.0 as there was a soft bug that did not check variable sizes. When running with newer versions of parallel-netcdf the netcdf interface would be used to read init files as the pnetcdf variable size check returned an error on the original files. The newer files I have sent do not have this issue as they are cdf-5, which is not subject to variable size constraints. @ndkeen I have placed new ocean and sea ice restart files you can use for your tests on KNL. I also have included a file with the necessary streams.cice changes for you. They are at
@worleyph these files are also on titan if you want to try as well at
|
@vanroekel , thank you for clarifying and cleaning up some (my) sloppiness in the discussion. Again, I believe that it is important for the coupled group to be aware of this issue and to make sure that they are using ocean and sea ice initialization and restart files that are cdf5 files for high resolution runs. It is also important for the default initialization files for the G and A_WCYCL compsets for high resolution meshes to be cdf5 files. I do not know who should be responsible for this latter task, but it probably is not @ndkeen or myself. |
@vanroekel , with regard to (thanks for providing this)
there is also a pointer to seaice.RRS18to6v3.restartFrom_titan0228.170321.nc in
Should this also be replaced by mpascice.rst.0053-01-01_00000_noxtime.nc ? |
I also had a question about the editing of the streams file. I made a copy of the files and created the .rst53 streams file below:
Can you verify I've made the correct edits? For the ocean, I simply used the mpaso rst53 file in 2 places. For the cice, I added the two entries you suggested, but a) one of them was already there and b) there was another that looks like it should also be changed to use mpascice rst53 files as well. |
@vanroekel , FYI - I am still getting the warnings ("Warning: abs(sum(h)-bottomDepth)>2m. ..."). Wasn't sure if the new restart files and changes to streams files was supposed to eliminate these or not. |
@vanroekel : Are mpaso.rst.0039-01-01_00000.nc and mpascice.rst.0039-01-01_00000.nc available on Titan ? |
@worleyph - The new files won't get rid of the warnings, but we should change the logging so it doesn't look like a fatal error. |
Luke gave me 0053 (not 0039) most recently. See the above comment from Luke as to the location of these files and example streams files on Titan. |
Does a run with * 0053 * files complete successfully (I am currently only looking at I/O performance, so warnings are ok for me as long as runs complete as expected)? |
@jonbob, do the warnings indicate that the simulation will not be realistic (performance-wise)? Mark Petersen was going to check on this at one point, but I never heard back what his conclusion was. |
@jayeshkrishna, yes the jobs run successfully using 0053 (and 0039). You'll have to ask Luke why he had us update to 0053. |
@mark-petersen will know more, but I don't believe it means the simulation is not realistic -- just has some inconsistencies in its initial conditions. I see those warnings in virtually every run lately. I'll open an MPAS issue so we don't forget about tracking this down. |
@jonbob and @worleyph I have seen these warnings as well and I have seen no visible issues with the simulation. I don't recall seeing it frequently. The last time I recall seeing this was when I updated to CIME53, which was a bit of a harder restart than usual. I had those warnings for 1 submission and then no issues since. @worleyph could you try a restart after your short test? i.e. do a 5-day simulation with warnings and then another 5-days with a restart and see if the warnings remain? Regarding year 53, vs. 39. There is no performance reason for this. I thought I'd pull the latest restart as the ocean will have a more spun-up circulation. But I wouldn't expect any performance change. |
I'm still traveling - home next week. Can't guarantee to do this until then. Perhaps someone else wants to volunteer. My concern is based on @mark-petersen 's comment
and Mark was going to verify that the results from my run were reasonable. |
@ndkeen - I checked your directory and there is no ocean restart file? But I think you followed the correct steps. Can you redo the original run (meaning set CONTINUE_RUN back to false) and let's make sure all the restart files get output correctly? |
OK, will do. Not sure what happened as I set REST_OPTION/REST_N to be same as STOP_OPTION/STOP_N:
But I admit I'm not sure I verified the restart files were there as I had tested it previously. |
I think you did everything correctly. I'm trying to see if something got messed up, but the ocn and acme logs both closed successfully. So nothing is jumping out -- other than running it again and making sure the restart file shows up. And I don't check for them either, so who knows? IO at high-res is just problematic... |
As Cori scratch will start enforcing purgingsoon, I've been trying to backup to tape. And before doing that, I have been very busy trying to remove large files from directories, so it's possibly I accidentally deleted the restart files. I can try in another directory that does have the restart files, but also restarted a fresh experiment just now. It would also be nice if the code could let us know a little sooner that the restart files are missing. |
That would make some sense -- everything looked set up to correctly output the files. And no sign in any log of any problems. So yes, please try again and let me know |
And I'll see if it's possible for restart files to get included in the check_input_data. I'm not sure, because the codes have to parse the rpointer files to fully extract the filenames. But it's always hard to have issues with missing files after getting through the queues |
I ran it again and here are the netcdf files I see:
So it does appear that the larger one somehow got deleted which was certainly by accident as I was cleaning up. Trying to restart from this.
|
That looks correct, though you should also have some restart files for the data models. We'll cross our fingers |
OK, it worked. Read in restarts, ran another 5 days, and exited. And wrote another set of restarts.
Sorry about the scare. |
Did ocean warning messages go away for the continuation run? |
No they are still being written |
So, @vanroekel , @ndkeen 's restart test is a failure? Anything else we should try? Should we modify the code to output more details on the failure, e.g. what the values (being read in) that trigger the warning look like? |
I did NOT remove them from the first run, but looking at the timestamps, most of them are written during the 2nd job. There are 32 log.ocean.*err files that have timestamp of "5:30" and 572 files with a stamp of "14:00". So I think they are still being written, but may not be exactly the same ranks. |
@worleyph - I don't think that means @ndkeen's test is a failure at all. I filed an issue in mpas to make these warnings appear as warnings instead of errors. We looked more at what spawns the errors/warnings, and it's minor differences between the sum of level thicknesses and the sea surface height. So as soon as we have some extra cycles, we'll:
Do you want me to point you at the corresponding mpas issue? |
Note: I missed an important "NOT" in my last comment. I did not remove the log files between the runs. |
Thanks.
No thanks. I still do want someone to update the compset definitions so that we can use @vanroekel 's files as the defaults (and in production runs @golaz ?) so that we don't run into the slow ocean initialization problems again. Unless @jayeshkrishna can work out why the fallback PIO method is so slow (and make it faster), making this less important. |
OK, that's a simple AMCE thing. I'll try to do that today. |
@jonbob I have a year 56 restart for an IC that you could use for acme if you'd like
|
@jonbob , won't be BFB :-) . Will anyone else care about this? |
no - it's not being tested yet. Or ever... |
@vanroekel - thanks. Are those files already on the svn server? If so, I'll update to just point at them... |
nope, I don't have push access to svn.
…On Thu, Jun 22, 2017 at 10:42 AM, Jon Wolfe ***@***.***> wrote:
@vanroekel <https://github.com/vanroekel> - thanks. Are those files
already on the svn server? If so, I'll update to just point at them...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1580 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ANB5UYgU9_MxsB2o_Ab4fGkVxzSi3dOmks5sGpl2gaJpZM4N1r_M>
.
|
@vanroekel - but xtime has been yanked out? |
just renamed to xtime.orig
…On Thu, Jun 22, 2017 at 11:48 AM, Jon Wolfe ***@***.***> wrote:
@vanroekel <https://github.com/vanroekel> - but xtime has been yanked out?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1580 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ANB5UVud48UJHiaPswCt8DnnOqGi6adjks5sGqj6gaJpZM4N1r_M>
.
|
perfect... |
@worleyph and @ndkeen - I've pushed @vanroekel's updated restart files to the svn data server but renamed them to be consistent with our previous convention. And I'm pulling together a PR to point at them by default, at least for the A_WCYCLXXXXS compsets that use the _ICG grids. However, it won't help with the corresponding G-cases. Luke and I were discussing the possibility of creating something like GMPAS-XXXS compsets that would also start from spunup initial condition sets. Do you think that would be useful? |
OK, the PR with the new IC files has been fulfilled. That should solve issues for the A_WCYCL cases, but not G-cases |
@jonbob and/or @vanroekel , I assume that this issue can be closed now? Can you confirm and do so? Thanks. |
@worleyph - I do believe the original reason for this issue has been resolved. Of course, new initial conditions files are now required because of all the recent changes to ocn/ice -- but that's a different matter and does not need to be tracked. |
This question first occurs in issue #1547 (posed by @ndkeen ) , but wanted to give it its own home. This is a question for the science and coupled groups, but impacts the performance group (among others).
The current ocean (and sea ice) iniitalization files for the water cycle compsets are not recognized as pnetcdf files within PIO, and the netcdf file reads used instead take a long time, making initialization very expensive. The ocean group provided new files that do not have this problem, but these files generate warnings (both for G and water cycle compsets) of the form
when used with current master.
Is there a timeline for updating the water cycle compset definitions to use "true" pnetcdf files? And do the warnings about the ocean initial data impact the performance of the model? Would hate to be using data for these runs to optimize they code performance when they are not representative of production runs.
The text was updated successfully, but these errors were encountered: