-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restart problem with MPASSI prescribed ice mode #3936
Comments
What machine are you trying this on? |
Thanks for tracking this down, @wlin7 - let me try this as well |
Thanks @rljacob . This just reminds me of some additional info. The tests were done on compy and cori-knl. Same behavior. There is one difference. cori-knl gave a back tracing when reporting "FATAL ERROR: NetCDF: Operation not allowed in define mode".
|
Thanks for looking into this, @jonbob . I completed a 6-year simulation that does not involve restart run. The results look reasonable. The longer AMIP simulation will require several continuation runs, so I am putting it on hold for now. |
It works fine with RUN_STARTDATE 0001-01-01, so I'm guessing it has something to do with the seaice model not picking up the change correctly |
@jonbob , did you mean your test with RUN_STARTDATE 0001-01-01 does not have some of the issues? My test with F2010 compset started from 0001-01-01 but still had the problems. |
My test with RUN_STARTDATE 0001-01-01 ran fine -- I didn't check and see if the results were BFB. My test with RUN_STARTDATE 2010-01-01 also ran fine, but I'm seeing if it will restart now. Can you point me to your case so I can compare? |
Update -- the restart from 2010-01-01 also completed. I'll try an ERS test |
Some BFB restart issue I just saw in the coupled RRM case. It may or may not be related to this restart problem with the MPASSI prescribed ice mode. The coupled RRM restart runs are non-BFB and likely caused by coupling between different components. See the "non-BFB" session on this page for details. |
Thanks for the update, @jonbob . My rundir on cori is Multiple runs were done using that case, so may not be straightforward to see the problem with the current state of what in there. But you can see the error in e3sm.log.36009612.201108-083603 when PIO_ICE_TYPENAME=netcdf. Are your tests using PIO_TYPENME=pnetcdf for all? The model runs ok with pnetcdf. I used template script from @xuezhengllnl for running F2010 case on compy. The script explicitly set PIO_TYPENAME="netcdf". That is how these problems are initially exposed. Can you also try change PIO_ICE_TYPENAME="netcdf" to see if you can reproduce the problem? Also in your test with RUN_STARTDATE=2010-01-01, what are the time stamp for mpassi.rst.atm.timeSeriesStatsMonthly (and mpassi.rst.am.timeSeriesStatsDaily)? BTW, did you set anything in user_nl_mpassi? That may impact how ice fields are saved. My run has it empty. |
@wlin7 - my ERS tests passed, both ERS.ne30pg2_r05_oECv3.F2010SC5-CMIP6-MPASSI.compy_intel and ERS.ne30_oECv3.F2010SC5-CMIP6-MPASSI.compy_intel. I'll check about using netcdf instead of pnetcdf, but is there a compelling reason you want to do this? I'm also leaving user_nl_mpassi empty, just running straight out of the box. |
@wlin7 - I am seeing the issue you're having, but only when we force seaice to use netcdf. Let me see if I can track down the problem. In the meantime, is there a reason not to use pnetcdf? |
@jonbob , I would be fine with using pnetcdf, but it should be helpful if it can also run with netcdf typename. Don't know if netcdf would be recommended over pentcdf under certain circumstances. There could be personal computers that do not have pnetcdf, but powerful enough for running SCM. When running with pnetcdf, did you see error type #2 in e3sm.log? I wonder in my case, if it is because the file was created during the first (and failed) attempt of running with netcdf, which was then not recognizable when running with pnetcdf. I will clean up the rundir and do a fresh run with pnetcdf. |
I saw the "define mode" problem, so I'll see if building it with debug on gives any more information |
@jonbob , I learned from @xuezhengllnl there was a time pnetcdf not working on compy. That was the reason there was a reset to netcdf in the run_e3sm script. pnetcdf is ok now for compy. At least for production machine, there is no reason not to use pnetcdf. That said, thanks for continuing to look into it. |
Hi @jonbob , just to update you that with a clean run using pnetcdf, the problem #2 and #3 as described above do not appear. And although timestamps for mpassi.rst.am.timeSeriesStat files are not consistent with that for other files, it does not affect the simulation and does not affect BFB reproducibility. The issue now is really just the #1. Sorry for the misleading information due to the testing sequence I used. |
MPASSI prescribed ice mode is being tested F2010 with compset F2010SC5-CMIP6-MPASSI and grid ne30_r05_oECv3. There are a several problems related to restart.
FATAL ERROR: NetCDF: Operation not allowed in define mode (/qfs/people/linw288/E3SM/integration/E3SM.testing/externals/scorpio/src/clib/pio_file.c: 349)
Not sure writing which mpassi restart file causing the problem.
The time stamp format for mpassi.rst.am.timeSeriesStatsMonthly (and mpassi.rst.am.timeSeriesStatsDaily) is not consistent with model time. For example, for a test run starting from 1980-01-01, the time stamp for the above two mpassi.rst file is 0001-01-01.
The run is ok when using PIO_ICE_TYPENAME=pnetcdf. But it would still produce error message like
ERROR: Opening file (20201105.alpha5_v1p-1.F2010-mpassi.ne30pg2_r05_oECv3.compy.mpassi.rst.am.timeSeriesStatsMonthly.0001-01-01_00000.nc) with iotype 1 (PIO_IOTYPE_PNETCDF) failed. The low level I/O library call failed. NetCDF: Unknown file format (error num=-51), (/qfs/people/linw288/E3SM/integration/E3SM.testing/externals/scorpio/src/clib/pioc_support.c:2956)
Restarting run using restart files generated from '2' above, the results are non-BFB.
The text was updated successfully, but these errors were encountered: