Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new ocean grid, RRS.30-10.km #308

Merged
merged 2 commits into from
Nov 19, 2015

Conversation

mark-petersen
Copy link
Contributor

This merge adds a new ocean grid:
oRRS.30-10.km is an MPAS ocean grid with a mesh density function that is roughly proportional to the Rossby radius of deformation, with 30 km gridcells at low and 10 km gridcells at high latitudes.

This is the lower-resolution of our two high-resolution grids, and can be used for both performance testing and initializing a model for scientific simulations. The associated initial, domain, and mapping files have been added to
https://acme-svn2.ornl.gov/acme-repo
in revision 135.

OG-426

[BFB]

@@ -14,13 +14,14 @@
<!-- &io -->
<config_stats_interval>'0001_00:00:00'</config_stats_interval>
<config_write_stats_on_startup>.true.</config_write_stats_on_startup>
<config_write_output_on_startup>.true.</config_write_output_on_startup>
<config_write_output_on_startup>.false.</config_write_output_on_startup>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@douglasjacobsen I turned the default to off so that the performance testers don't wait for a 3.4 GB file to write for a short test.

@mark-petersen
Copy link
Contributor Author

@douglasjacobsen FYI, I tested this branch on mustang, running the RRS.30-10.km for 4 hours using the default 8 min timestep. If you want to look, it is at:
/lustre/scratch1/turquoise/mpeterse/ACME/cases/a05o/run
a quick look at statistics shows it runs normally. Here is global KE, with cross at every timestep:
a05o_ke

@douglasjacobsen
Copy link
Member

@mark-petersen There are mapping files missing from the input data repo. If you run with a clean input_data repo, you get the following:

File is missing: /lustre/scratch1/turquoise/douglasj/ACME/input_data/cpl/gridmaps/T62/map_T62_TO_oRRS30to10_aave.150722.nc 
File is missing: /lustre/scratch1/turquoise/douglasj/ACME/input_data/cpl/gridmaps/T62/map_T62_TO_oRRS30to10_blin.150722.nc 
File is missing: /lustre/scratch1/turquoise/douglasj/ACME/input_data/cpl/gridmaps/T62/map_T62_TO_oRRS30to10_patc.150722.nc 
File is missing: /lustre/scratch1/turquoise/douglasj/ACME/input_data/cpl/gridmaps/T62/map_T62_TO_oRRS30to10_patc.150722.nc 
File is missing: /lustre/scratch1/turquoise/douglasj/ACME/input_data/cpl/gridmaps/T62/map_T62_TO_oRRS30to10_aave.150722.nc 
File is missing: /lustre/scratch1/turquoise/douglasj/ACME/input_data/cpl/gridmaps/T62/map_T62_TO_oRRS30to10_blin.150722.nc

@mark-petersen
Copy link
Contributor Author

@douglasjacobsen I realized that a minute ago, and added it with svn revision 136. Try again.

@mark-petersen
Copy link
Contributor Author

@douglasjacobsen I tried to put in a different processor count, but without luck. I tried:


<pes GRID="a%T62_o%oRRS30to10" >
    <NTASKS_ATM>256</NTASKS_ATM> <NTHRDS_ATM>1</NTHRDS_ATM> <ROOTPE_ATM>0</ROOTPE_ATM>
    <NTASKS_LND>256</NTASKS_LND> <NTHRDS_LND>1</NTHRDS_LND> <ROOTPE_LND>0</ROOTPE_LND>
    <NTASKS_ICE>256</NTASKS_ICE> <NTHRDS_ICE>1</NTHRDS_ICE> <ROOTPE_ICE>0</ROOTPE_ICE>
    <NTASKS_CPL>256</NTASKS_CPL> <NTHRDS_CPL>1</NTHRDS_CPL> <ROOTPE_CPL>0</ROOTPE_CPL>
    <NTASKS_OCN>256</NTASKS_OCN> <NTHRDS_OCN>1</NTHRDS_OCN> <ROOTPE_OCN>0</ROOTPE_OCN>
    <NTASKS_GLC>256</NTASKS_GLC> <NTHRDS_GLC>1</NTHRDS_GLC> <ROOTPE_GLC>0</ROOTPE_GLC>
    <NTASKS_ROF>256</NTASKS_ROF> <NTHRDS_ROF>1</NTHRDS_ROF> <ROOTPE_ROF>0</ROOTPE_ROF>
    <NTASKS_WAV>256</NTASKS_WAV> <NTHRDS_WAV>1</NTHRDS_WAV> <ROOTPE_WAV>0</ROOTPE_WAV>
    <PES_LEVEL>1r</PES_LEVEL>
</pes>

and also tried these for the first line:

<pes GRID="o%oRRS30to10" >
<pes GRID="a%T62_oRRS30to10" >
<pes GRID="o%oRRS30to10" CCSM_LCOMPSET="DATM.+DLND.+DICE.+MPASO">

but it always catches only the T62 setup, which is the 64 processor count:

create_newcase -case $CASE_ROOT/$ACME_CASE -compset CMPASO-IAF -mach mustang -res T62_oRRS30to10
...
-----------------------The PE layout for this case match these options:
GRID =  a%T62

That happens regardless of what I add to the scripts/ccsm_utils/Machines/config_pes.xml file.

If you have any advice, that would be great.

@mark-petersen
Copy link
Contributor Author

@douglasjacobsen To be safe, I pushed the file
mpas-o.graph.info.150722.part.64
to the svn repo. If you are happy with that solution, we can leave the config_pes.xml file.

@worleyph
Copy link
Contributor

I exported the files in

https://acme-svn2.ornl.gov/acme-repo/acme/inputdata/ocn/mpas-o/oRRS30to10

to inputdata/ on Edison/Hopper, Mira, and Titan.

@mark-petersen , do you know how the graph partition files were generated? I like to document this along with the partition files.

@mark-petersen
Copy link
Contributor Author

@worleyph mpas-o.graph.info.150722 was made by our MPAS mesh initialization process. The graph partition files, mpas-o.graph.info.150722.part.*, were made with a simple gpmetis command.

There are also mapping, domain, and run-off files in these places:
input_data/cpl/cpl6/oRRS30
input_data/share/domains/oRRS30
input_data/cpl/gridmaps/T62/oRRS30

@worleyph
Copy link
Contributor

@mark-petersen

There are also mapping, domain, and run-off files in these places:

Done, on Edison/Hopper, Mira, and Titan.

The graph partition files, mpas-o.graph.info.150722.part.*, were made with a simple gpmetis command.

I've been saving the output of the metis command for the partitions, because there are so many options and because the defaults might change with the version of (gp)metis. Example,

 > more mpas-o.graph.info.150107.2048_prov.txt
 ******************************************************************************
 METIS 5.0 Copyright 1998-13, Regents of the University of Minnesota
  (HEAD: , Built on: Sep 23 2014, 12:48:06)
  size of idx_t: 32bits, real_t: 32bits, idx_t *: 64bits

Graph Information -----------------------------------------------------------
 Name: mpas-o.graph.info.150107, #Vertices: 234095, #Edges: 693402, #Parts: 2048

Options ---------------------------------------------------------------------
 ptype=kway, objtype=cut, ctype=shem, rtype=greedy, iptype=metisrb
 dbglvl=0, ufactor=1.030, no2hop=NO, minconn=NO, contig=NO, nooutput=NO
 seed=-1, niter=10, ncuts=1

Direct k-way Partitioning ---------------------------------------------------
 - Edgecut: 77776, communication volume: 83587.

 - Balance:
     constraint #0:  1.024 out of 0.009

 - Most overweight partition:
     pid: 1, actual: 117, desired: 114, ratio: 1.02.

 - Subdomain connectivity: max: 8, min: 1, avg: 5.55

 - There are 9 non-contiguous partitions.
   Total components after removing the cut edges: 2058,
   max components: 3 for pid: 845.

Timing Information ----------------------------------------------------------
  I/O:             0.228 sec
  Partitioning:             3.048 sec   (METIS time)
  Reporting:                         0.304 sec

Memory Information ----------------------------------------------------------
  Max memory used:    27.470 MB
******************************************************************************

@mark-petersen
Copy link
Contributor Author

@worleyph FYI, @douglasjacobsen and I have both run this high-rez ocean grid successfully on mustang (LANL). If you or anyone else on the performance team could test on another ACME machine, that would be great. We are using the command:

create_newcase -case $CASE_ROOT/$ACME_CASE -compset CMPASO-IAF -mach mustang -res T62_oRRS30to10

and have tested on 1200 cores on mustang. It should run on 120 to 60k cores on edison (see https://acme-climate.atlassian.net/wiki/display/PERF/2015/08/12/MPAS-Ocean+Stand-alone+Performance+Tests?preview=/31752763/31752752/performance_MPAS-O_RRS30-10_loglog.pdf for MPAS-Ocean stand-alone equivalent).

More information about the RRS 30-10 km spin-up within ACME may be found at
https://acme-climate.atlassian.net/wiki/display/OCNICE/2015/08/28/High-resolution+Ocean-only+spin-up+status

@worleyph
Copy link
Contributor

@mark-petersen ,

If you or anyone else on the performance team could test on another ACME machine, that would be great.

On my todo list for today. Hopefully have something for you by tomorrow (if not sooner).

Thanks again.

@worleyph
Copy link
Contributor

Note, this requires downloading the (large) inputdata/ocn/iaf directory to each target system. I've now taken care of Titan. This is not in the ACME repository yet - I got it from NCAR.

@worleyph
Copy link
Contributor

@mark-petersen and @douglasjacobsen , this core dumped the first time I tried (Titan, pgi, 1024 processes)

(ocn.log.xxx)
...
Doing timestep 0001-01-01_05:04:00
MPAS ymd= 10101 MPAS tod= 18240
sync ymd= 10101 sync tod= 18000
Internal mpas clock not in sync with sync clock
Doing timestep 0001-01-01_05:12:00

(cesm.log.xxx)
....
_pmiu_daemon(SIGCHLD): [NID 01033] [c2-5c1s4n1] [Mon Aug 31 13:57:56 2015] PE RANK 625 exit signal Aborted
[NID 01033] 2015-08-31 13:57:56 Apid 9094657: initiated application termination

I'll try with the Intel compiler next. Anything that I might want to look for? Note that I modified this to add GPTL support, so perhaps I did something myself. (Sure would be nice to have GPTL support in there - hint, hint.)

@worleyph
Copy link
Contributor

core dump with the intel compiler as well, "almost" the same location?

(ocn.log.XXX)

 ...
  Doing timestep 0001-01-01_05:04:00
  MPAS ymd=       10101  MPAS tod=       18240
  sync ymd=       10101  sync tod=       18000
  Internal mpas clock not in sync with sync clock
  Doing timestep 0001-01-01_05:12:00
  Doing timestep 0001-01-01_05:20:00

(cesm.log.XXX)

 _pmiu_daemon(SIGCHLD): [NID 02399] [c6-1c2s0n1] [Mon Aug 31 14:36:15 2015] PE RANK 625 exit signal Aborted
 [NID 02399] 2015-08-31 14:36:15 Apid 9094841: initiated application termination

Note that this is the same process. I'll try again with more memory per process.

@mark-petersen
Copy link
Contributor Author

@worleyph If you add
config_dt = '00:01:00'
to the end of $CASE_ROOT/$ACME_CASE/user_nl_mpaso
Then it will run indefinitely. I was able to get a successful spin-up as follows:

  • dt=1 minute for 4 days
  • dt=8 minutes for the remainder.

This has run for over a year now.

The default configuration in ACME is for dt=8 minutes, rather than the one minute for spin-up, because I thought the performance tests would short (10 or 20 time steps). Would you prefer for the default to be dt=1 minute so it runs out without error? If so, the performance team needs to know that standard timestep is 8 minutes when computing throughput in SYPD.

@worleyph
Copy link
Contributor

For my experiments previously I ran for 5 days. How would I run for 4 days with dt=1 minute and then 5 more days with dt=8 minutes? Can you generate a restart file that I can use (and can this same restart file be used for all systems and all decompositions, or does it need to be generated separately for each experiment)?

I'll also need my hand held to start from a restart file - something I've never learned how to do myself.

@mark-petersen
Copy link
Contributor Author

@worleyph, I think you have a few options for the performance test:

  1. run straight 'out of the box' (dt=8 min) for a short period (5 hours or less)
  2. run with dt=1 min for 1 day, multiply simulated time by 8 to get correct SYPD.
  3. as you suggest, spin up with dt=1, then dt=8 min. Instructions below.

Considering this is high resolution, I think (1) is sufficient for most performance characterizations (i.e. 40 time steps). And (3) is a more involved test, but you get to see ocean time versus coupler and other components.

Of course, a better solution is for me to upload 'initial conditions' that are already spun up, so you can run with dt=8 min directly. I tried that and couldn't get it to work, so I uploaded these as the next best thing.

  • How would I run for 4 days with dt=1 minute and then 5 more days with dt=8 minutes?

In env_run.xml, run with

<entry id="STOP_OPTION"   value="ndays"  />
<entry id="STOP_N"   value="4"  />

and add
config_dt = '00:01:00'
to the end of $CASE_ROOT/$ACME_CASE/user_nl_mpaso

After the run is done, delete config_dt = '00:01:00' from $CASE_ROOT/$ACME_CASE/user_nl_mpaso, change env_run.xml to:

<entry id="STOP_N"   value="5"  />
<entry id="CONTINUE_RUN"   value="TRUE"  />

@worleyph
Copy link
Contributor

worleyph commented Sep 1, 2015

@mark-petersen - Thanks! I'll give option 3 a try. You've already been doing performance studies. I want to focus on more ACME-production-like scenarios.

@worleyph
Copy link
Contributor

worleyph commented Sep 2, 2015

@mark-petersen , I successfully (?) generated a restart file from a 4 day run using dt=1 min. My attempt to continue this run using dt=8 min failed (see below). Is it obvious from this what I did wrong? Have you tried to run a continuation run for a C case using MPAS-Ocean?

Thanks.

(from cpl.log.xxx)

 ...
 (prep_ice_init) : Initializing mapper_SFo2i
 (seq_map_init_rearrolap)  mapper counter, strategy, mapfile =     13 rearrange undefined

 (seq_mct_drv) : Performing domain checking
 (seq_domain_check)  --- checking ocean maskfrac ---
 (seq_domain_check)  --- checking ice maskfrac ---
 (seq_domain_check)  --- checking ocn/ice domains ---
 (seq_domain_check_grid)  the domain size is =          177
 (seq_domain_check_grid)  maximum           difference for mask   0.00000000000000
 (seq_domain_check_grid)  maximum allowable difference for mask  0.100000000000000E-01
 (seq_domain_check_grid)  the domain size is =          177
 (seq_domain_check_grid)  maximum           difference for lat   89.9999999999963
 (seq_domain_check_grid)  maximum allowable difference for lat  0.100000000000000E-01
  (seq_domain_check_grid) ERROR: incompatible domain grid coordinates
  ERROR: (seq_domain_check_grid)  incompatible domain grid coordinates

(from cesm.log.xxx)

 ...
 ----- done parsing run-time I/O from streams.ocean -----

  Setting mpi info: striping_factor=16
  Setting mpi info: striping_unit=1048576
  ACME/models/utils/pio/pionfput_mod.F90.in         128           1           1               64           1 
   0001-01-05_00:00:00
  ERROR: (seq_domain_check_grid)  incompatible domain grid coordinates
 Image              PC                Routine            Line        Source
 cesm.exe           00000000008FA229  shr_sys_mod_mp_sh         282  shr_sys_mod.F90
 cesm.exe           000000000045FACD  seq_domain_mct_mp         685  seq_domain_mct.F90
 cesm.exe           000000000045E4B2  seq_domain_mct_mp         357  seq_domain_mct.F90
 cesm.exe           0000000000414481  ccsm_comp_mod_mp_        1557  ccsm_comp_mod.F90
 ...

@worleyph
Copy link
Contributor

worleyph commented Sep 2, 2015

Also, from ocn.log.xxx

 ...
 ----- done assigning dimensions from Registry.xml -----


  Pressure type is: pressure_and_zmid
 Vertical coordinate movement is: uniform_stretching
  Error: arc_bisect: A and B are diametrically opposite
  ... (1708 repeats; running on 8192 processes) ...
  Error: arc_bisect: A and B are diametrically opposite
  Initial time 0001-01-05_00:00:00

@mark-petersen
Copy link
Contributor Author

@worleyph sorry for the delay. I was able to restart successfully on LANL mustang by just changing CONTINUE_RUN to true. This last error (A and B are diametrically opposite) looks like the restart file was not written to disk properly, so that some mesh variables are wrong. I suspect it is an i/o problem.

I recommend changing these MPAS-Ocean flags:

    config_pio_num_iotasks = 100
    config_pio_stride = 12

add these to the end of $CASE_ROOT/$ACME_CASE/user_nl_mpaso, and start your run from the beginning (not the restart). Set the stride to the processor/node count, and the iotasks to the node count for this run.

You may be able to diagnose the problem. In your run directory, you should see a rst.ocn.00*nc file. Here is one of mine:

mu1595.localdomain> ls -l rst*
-rw-rw-r-- 1 mpeterse mpeterse 12323491660 Aug 31 23:08 rst.ocn.0002-04-04_00.00.00.nc

Yours should be exactly the same size. You should also be able to do an ncdump -h and see reasonable stuff. For example.

ncdump -v xCell rst.ocn.*.nc | m
...
data:
 xCell = 3745734.51595984, -989866.522842592, -4842476.73722036, 

these are x locations of cell, in meters, on the earth.

@worleyph
Copy link
Contributor

worleyph commented Sep 4, 2015

Thanks. I had to change the PIO settings in env_run.xml in my run (got
one of the random "PIO could not figure out a valid mapping" errors).
Sounds like this needs to be coordinated with separate MPAS-O PIO settings?

What is the relationship between the env_run.xml PIO settings and
config_pio_num_iotasks and config_pio_stride in user_nl_mpaso ?

Pat

On 9/4/15 4:42 PM, Mark Petersen wrote:

@worleyph sorry for the delay. I was able to restart successfully on LANL mustang by just changing CONTINUE_RUN to true. This last error (A and B are diametrically opposite) looks like the restart file was not written to disk properly, so that some mesh variables are wrong. I suspect it is an i/o problem.

I recommend changing these MPAS-Ocean flags:

     config_pio_num_iotasks = 100
     config_pio_stride = 12

add these to the end of $CASE_ROOT/$ACME_CASE/user_nl_mpaso, and start your run from the beginning (not the restart). Set the stride to the processor/node count, and the iotasks to the node count for this run.

You may be able to diagnose the problem. In your run directory, you should see a rst.ocn.00*nc file. Here is one of mine:

mu1595.localdomain> ls -l rst*
-rw-rw-r-- 1 mpeterse mpeterse 12323491660 Aug 31 23:08 rst.ocn.0002-04-04_00.00.00.nc

Yours should be exactly the same size. You should also be able to do an ncdump -h and see reasonable stuff. For example.

ncdump -v xCell rst.ocn.*.nc | m
...
data:
  xCell = 3745734.51595984, -989866.522842592, -4842476.73722036,

these are x locations of cell, in meters, on the earth.


Reply to this email directly or view it on GitHub:
#308 (comment)

@jayeshkrishna
Copy link
Contributor

The user_nl_mpaso settings should be generated from the env_run.xml settings (So AFAIK the ideal place to modify the PIO settings for a case is env_run,xml).

@mark-petersen
Copy link
Contributor Author

Hmmm... It sounds like the MPAS-O flags

    config_pio_num_iotasks = 100
    config_pio_stride = 12

should be overwritten by ACME, but currently they are not. @worleyph for this test, please set these flags in user_nl_mpaso just to see if the restart will work.

@worleyph
Copy link
Contributor

worleyph commented Sep 4, 2015

I made some other changes as well (including using pnetcdf instead of netcdf, and the Intel compiler instead of PGI). As indicated in private e-mails, CONTINUE runs worked fine using master and with the otehr grids. Looking at mpaso_in, I see

config_pio_num_iotasks = 0
config_pio_stride = 0

in the successful runs.

In any case, I'll be more conservative, making sure that this was not user error, and determine exactly where things break.

@douglasjacobsen
Copy link
Member

@worleyph: Yesterday when you asked me about this stuff I forgot a small detail... as far as I remember, the flags you mentioned:

config_pio_num_iotasks
config_pio_stride

Are not actually used within MPAS+ACME. MPAS within ACME should obey whatever you set in env_run.xml, as ACME sets up the PIO subsystem, and hands it off to MPAS (i.e. MPAS doesn't configure PIO at all in ACME).

So, it shouldn't matter what the value is for those two things.

However, @mark-petersen and I have been discussing how to update this grid so that it runs using a default time step without having to change everything to make it a spin-up run. That would help get an accurate performance estimate for this case.

@worleyph
Copy link
Contributor

worleyph commented Sep 9, 2015

I have been spending a lot of time on this, without any success (though my my recent runs were with setting config_pio_num_iotasks and config_pio_stride, which I now learn do nothing).

Continuation runs with master using the mpas120 and oEC60to30 (both compset CMPASO-NYF) work fine for me.

Continuation runs using @mark-petersen 's branch and the oRRS30to10 grid (for compset CMPASO-IAF) always fail, both pgi and intel, and both pnetcdf and netcdf, with the same error at the beginning of the continuation run.

@mark-petersen has stated that continuation runs work for him. Mark, please verify that you have done exactly what I am doing, preferably on a system that I can duplicate this on, and then send me a pointer. I will try to duplicate. Continuation runs need to work (regardless of the performance studies), and I would like to determine whether there is a bug or not.

Thanks.

@mark-petersen
Copy link
Contributor Author

@worleyph sorry about the confusion on the MPAS-O pio flags.

Considering the issues we've had with restarts, I'd like to put this pull request on hold until the following items are complete. This is all caused by the fact that spin-ups at high resolution require a different configuration at the very beginning.

  1. Partial bottom cell alteration moved from forward to init mode. This will make restarts much simpler. (MPAS-O pull request, job for me).
  2. Spun-up state used for ACME initial condition file. Then we can use the full time step (8 min) rather than the current short time step from a cold start with zero velocity. This requires altering init files and uploading them to the Oak Ridge repo. (job for me)
  3. Point ACME to latest MPAS-Dev/MPAS:ocean/develop commit. This is a separate ACME pull request, as the default mpas-o namelist will need updating. (job for me and/or Doug)

I suspect those items will take three weeks, but Doug and I are both out of town for a week, so I would target Oct 7.

@worleyph
Copy link
Contributor

worleyph commented Sep 9, 2015

I'm pushing back on

 This is all caused by the fact that spin-ups at high resolution require a different configuration at the very beginning.

The need to do spin-ups is causing us to try to do restarts, but I do not see that it is causing the restarts to fail. The failure is something separate, correct? I would also like someone else to verify that they see failures. Perhaps this is all (my) user error. If there is a bug, we need to add this to the ocean development todo list, correct? Or do you believe that these other 3 pull requests will address the issue that I have been dealing with?

@mark-petersen
Copy link
Contributor Author

@worleyph We are back on this. Looking back at your comments, the impediment to this pull request is that you have not been able to do a successful restart run at this resolution. Were all those issues on titan?

@maltrud is trying an initial run and restart on edison. Hopefully that will give us some insight into the issue.

@douglasjacobsen
Copy link
Member

I think the impediment (as far as I'm concerned) is that we are unable to even perform an initial run using the default configuration (i.e. namelist options).

@worleyph
Copy link
Contributor

I'm not sure that my issues are related to this pull request specifically. Just trying to use this new grid highlighted some issues (both mine and Doug's). Perhaps it would be easier to work on these issues after it has been merged into master? I am not aware that it breaks anything that currently works, but then my experiments have been very focused and limited.

@mark-petersen mark-petersen force-pushed the mark-petersen/mpas-o/add_mesh_RRS.30-10.km branch from 91f6947 to b3eab7c Compare October 15, 2015 13:10
@mark-petersen
Copy link
Contributor Author

I just updated this branch by cherry-picking the single old commit the the head of master. I tested on wolf with 1024 cores, using
create_newcase -case $CASE_ROOT/$ACME_CASE -compset CMPASO-IAF -mach wolf -res T62_oRRS30to10 -compiler gnu
only changing
<entry id="DOUT_S" value="FALSE" />
in env_run.xls and it works.

I'm testing a restart now. @jonbob @vanroekel @douglasjacobsen @worleyph you are welcome to try it.

I changed the default time step to 1 minute, which works for spin-up. After 4 days you can change it to 8 minutes, or 6 minutes for hourly coupling intervals.

This may be run with a C-comp set with all default settings.
For initial spin-up and testing, add
 config_rayleigh_damping_coeff = 1.0e-3
 config_rayleigh_friction = .true.
to user_nl_mpas-o
@mark-petersen mark-petersen force-pushed the mark-petersen/mpas-o/add_mesh_RRS.30-10.km branch from b3eab7c to 6ae8e85 Compare October 15, 2015 13:59
@mark-petersen
Copy link
Contributor Author

Tested on mustang with 1200 cores, wolf with 1024 cores.
Tested restart after one day on wolf, and it works.
I changed the default time step to 6 minutes. Now the spin-up and testing requires:
config_rayleigh_damping_coeff = 1.0e-3
config_rayleigh_friction = .true.
in user_nl_mpas-o. This has the advantage that time step for performance testing and actual run will not change.

Model can run with 8 minute ocean timestep, but only with 2-hourly coupling. I reduced it to 6 minutes for hourly coupling, which is the default.

@rljacob
Copy link
Member

rljacob commented Oct 28, 2015

What is the status of this PR?

@mark-petersen
Copy link
Contributor Author

I plan to upload a pre-spun up initial condition for this mesh by tomorrow and update the date stamp referencing the initial condition file in this PR. Then no flag alterations will be needed to start up at this resolution.

This initial conditions runs with all default ocean settings, including
dt and Rayleigh damping.
@mark-petersen
Copy link
Contributor Author

@douglasjacobsen this is now ready to merge. Files are uploaded to svn repo.
New initial conditions run with all default ocean settings, including dt and Rayleigh damping.

@mark-petersen
Copy link
Contributor Author

Tested successfully with:

create_newcase -case $CASE_ROOT/$ACME_CASE -compset CMPASO-IAF -mach wolf -res T62_oRRS30to10 -compiler gnu

Ran at 1 Sim day/40 WC minutes on 256 cores on wolf. Restart was successful. Here is the kinetic energy. You can see that the KE starts at a nonzero value, and adjusts slightly from there.
a06m_avgke_1
a06m_maxke_1

@rljacob rljacob added this to the v1.0 Alpha milestone Nov 3, 2015
@mark-petersen
Copy link
Contributor Author

@jayeshkrishna This will work as an i/o stress test. See previous comment for create case instructions. Restart files are 12 GB.

douglasjacobsen added a commit that referenced this pull request Nov 18, 2015
#308)

This merge adds a new ocean grid:
oRRS.30-10.km is an MPAS ocean grid with a mesh density function that is
roughly proportional to the Rossby radius of deformation, with 30 km
gridcells at low and 10 km gridcells at high latitudes.

This is the lower-resolution of our two high-resolution grids, and can
be used for both performance testing and initializing a model for
scientific simulations. The associated initial, domain, and mapping
files have been added to
https://acme-svn2.ornl.gov/acme-repo
in revision 135.

* mark-petersen/mpas-o/add_mesh_RRS.30-10.km:
  Update file date stamp to spun-up version of oRRS30to10
  Add new ocean grid, RRS.30-10.km

OG-426

[BFB]
@douglasjacobsen
Copy link
Member

Merged to next

@douglasjacobsen douglasjacobsen merged commit c7b038c into master Nov 19, 2015
douglasjacobsen added a commit that referenced this pull request Nov 19, 2015
This merge adds a new ocean grid:
oRRS.30-10.km is an MPAS ocean grid with a mesh density function that is
roughly proportional to the Rossby radius of deformation, with 30 km
gridcells at low and 10 km gridcells at high latitudes.

This is the lower-resolution of our two high-resolution grids, and can
be used for both performance testing and initializing a model for
scientific simulations. The associated initial, domain, and mapping
files have been added to https://acme-svn2.ornl.gov/acme-repo
in revision 135.

* mark-petersen/mpas-o/add_mesh_RRS.30-10.km:
  Update file date stamp to spun-up version of oRRS30to10
  Add new ocean grid, RRS.30-10.km

OG-426

[BFB]
@douglasjacobsen douglasjacobsen deleted the mark-petersen/mpas-o/add_mesh_RRS.30-10.km branch November 19, 2015 16:00
rljacob added a commit that referenced this pull request Aug 11, 2016
12d2135 Merge pull request #388 from ESMCI/jgfouca/need_to_report_build_exceptions
8f677cd Add test to ensure build fails report info to teststatus.log
7095ef0 Need to report build exception contents
7c9cc94 Merge pull request #387 from ESMCI/jgfouca/fix_case_build_return_code
bf941ed case.build needs to check success in order to return a sane error code
df432e8 update ChangeLog
154d5f8 Merge pull request #378 from ESMCI/rljacob/update-config-files
6b8fc76 Merge pull request #382 from ESMCI/sarich/fix-taskmaker-counter
5df46a2 Merge pull request #381 from jedwards4b/test_fixes
e58d624 component_compare_test was not properly reporting failures
0e0e577 Update acme config_files for mpas
84122dd Merge pull request #376 from ESMCI/jgfouca/changes_from_acme
38e2f8a More stuff from ACME
c86398e Merge pull request #375 from ESMCI/jgfouca/portable_run_cmd_utest
daaf621 Merge pull request #373 from ESMCI/jgfouca/enhance_bisect
e61ba96 Change MPAS compset for test
c77a64e Add homme python test
3031f00 Better support for 'none' module system
3ceef7b Make run_cmd_no_fail unit test more portable
bbd20fb Merge branch 'jgfouca/fix_module_list' (PR #374)
6eb8143 Reactivate creation of software_environment.txt
4dd30c6 Ensure module setup is sourced before list
c8bd20e fix bug in translation from perl
20a3412 cime_bisect: Add better support for modifying create_test run
77871de update changelog
e74906b comment out code until it works for tcsh users
4a13413 fix issue with module list
6ad4b2d update changelog
cef688d update changelog
835b511 Merge pull request #367 from jedwards4b/user_mod_0len_fix
b280b55 Merge pull request #362 from ESMCI/jgfouca/remove_perl_taskmaker
12a30ee Merge pull request #356 from ESMCI/jgfouca/minor_timing_chg
3fb0b80 Merge pull request #355 from ESMCI/jgfouca/wait_for_test_refactor
d083933 user_nl_ file was being removed if a user_nl file in any mods directory was missing
39de940 Fix comment
5d38420 Revert "Merge pull request #343 from ESMCI/wilke/scripts/xmlchange"
8dc2354 Merge pull request #363 from ESMCI/rljacob/machines/fix-acme
0773aac Increase default walltime for blues
c4dce0f Remove last uses of taskmaker.pl
76eb1bc Remove -A directive from edison
d67b267 Merge pull request #361 from billsacks/cism_nag
bf02e3e Merge pull request #357 from ekluzek/fixpionml
c9b8910 Seperate out modelio namelist definition since it uses the same names, but defines them differently
805ad7f Add -mismatch_all when compiling cism with nag
fff9a9b Set CHECK_TIMING to true in addition to SAVE_TIMING if --save-timing given to create_test
f25a518 wait_for_tests will now always specifically wait for the RUN phase
e19e72c Update drv buildnamelist test to work with cime5
db1538e Merge branch 'douglasjacobsen/add_lanl_machines' (PR #353)
893c6c6 Add support for LANL's mustang and wolf to cime
dea8a3a Merge pull request #350 from ESMCI/rerun_test_functionality
9b4488b Changes based on github feedback
c103e08 Add SAVE_TIMING_DIR for edison
98f95bb Merge pull request #341 from ESMCI/santos/fix-env-leakage
1b775e4 Fixes post-upstream-merge
ea97b56 Merge branch 'master' into rerun_test_functionality
1ca1b83 Merge pull request #348 from ESMCI/jayeshkrishna/machinefiles/get_acme_cime_dev_working_on_mira
252aea7 Merge pull request #343 from ESMCI/wilke/scripts/xmlchange
2db894f Add missing files
622b7d0 Complete
ba69385 progress
84000ee Fixing the runjob command for ACME on Mira
1a55232 Adding config for ERS_Ld3.ne30_g16_rx1.A test
d1df346 Error handling; check for correct length of key-value pair array after split
fa1cb49 listofsettings allways an array, test for length of array
a6a3d33 Changed number of expected positional arguments to 0 or 1 , warnings and debug statements
74d2100 checking for missing values in settings string from command line
cd350da Remove `GenericXML` check for env variables.
6ca6b59 progress
e7b334e Merge pull request #340 from ESMCI/douglasjacobsen/fix_test_template
43807a5 Add white space after batch directives in script templates
e061505 Merge pull request #337 from ESMCI/jgfouca/autosave_env_info
edc1671 Autosave environment information in case_setup.
33ce89b Merge pull request #336 from ESMCI/jgfouca/fix_create_test_not_catching_missing_project
c3f8f84 create_test was not failing the create_newcase phase when project info was missing
fef81df Merge pull request #335 from ESMCI/jgfouca/add_queue_option_to_create_test
731f8a0 Add ability to select queue to create_test and create_newcase
6f9613f Merge pull request #333 from ESMCI/jgfouca/even_more_sky_env_fixes
11877a2 Fix mismatch between MPI_PATH and the mpi module being loaded
d37e177 Merge pull request #322 from ESMCI/jgfouca/restore_good_python_version_error
ad18c34 Merge pull request #331 from ESMCI/jgfouca/reduce_output_from_check_input_data
0268cb7 Only report present files in debug mode
47b4216 Fix spelling mistake
dddd7f0 Merge pull request #327 from ESMCI/jgfouca/fix_more_sky_env_issues
07aeb52 Get cime_developer building again on skybridge.
0aceb94 Merge branch 'wilke/template/directives' (PR #324)
d8f331c moved batchdirectives to top of the template
dbbd3a1 Users should get a nice error when their python is too old
5eea798 Merge pull request #320 from ESMCI/jgfouca/fix_skybridge_env_issues
ca3e004 Fix skybridge environment problems, port to new SEMS modules
1a82a57 Merge pull request #318 from ESMCI/jgfouca/remove_sentinel_concept
e266dd9 Remove sentinel concept from jenkins_generic_job
aae8e30 comments for cime5.0.5
e20f807 Merge pull request #317 from billsacks/restore_lii
1967bb3 Restore LII test
c404dfa Merge remote-tracking branch 'origin/master'
0406b7d updates for cime5.0.4
7930cd7 Merge pull request #316 from ESMCI/jgfouca/update_code_checker
009f7a1 code_checker: Leverage .gitignore by using git ls-files instead of find to get list of ifles to check
c5555c7 Merge pull request #315 from Katetc/master
c4817d7 fix issue 314
7a493a0 remove multiple run lines from test file
c269ab9 Merge pull request #313 from ESMCI/jgfouca/correctly_report_problem_in_test
734e4b4 Rolling the Intel compiler back to v15.0.2
9a46a99 On batch systems, be sure to report that the problem is with wait_for_tests, not create_test
0e95d19 add some more info to README file
e9586c1 Changes required to support the new Hobart cluster configuration
cdb7805 update documentation for --xml options
98a0380 Merge pull request #307 from ESMCI/rljacob/tests/add-readme
46ff3c6 work on test rerunability
c5f2ae9 Merge pull request #308 from ESMCI/jgfouca/fix_check_input_bug
bddea72 document ERR test
8639ba1 create ability to run tests in same case more than once
b18580b Fix minor erroneous output bug in check_input_data
6f3e448 Add README back to Testcases
fd001a7 Add README back to SystemTests
929f05a Merge pull request #305 from ESMCI/jgfouca/advanced_profiling_tool
20c2e22 Make prof tools a bit more user friendly
1f9b49b Merge pull request #304 from jedwards4b/dynamic_system_test_dirs
0ba7f50 handle so that we dont have a list of test names to maintain
92a266e handle so that we dont have a list of test names to maintain
99f3d35 New tool 'advanced-py-prof'
4e6f7f5 initialize contents
3703d94 machine specific fixes for edison/cori/slurm systems
f6dc40a fixes git issue 303
7bacf48 repeat change for acme
503d5ad load system test directories dynamically based on paths in config_files.xml
b28ddff Merge pull request #302 from ESMCI/jgfouca/profiling_tool_etc
0e1553f Add a new tool for very simple python profiling
1eb2a56 Merge pull request #301 from jedwards4b/shell_commands_delete
2f78ec7 remove any existing shell_commands files from case before writing new ones
5f85c05 update changelog for tag
4ef656d pylint fixes
8624ece fix needed for scripts_regression_tests following PR298
6cc9110 These were supposed to be in PR296
d19afa9 Merge remote-tracking branch 'jedwards/testing_fixes' (PR #296)
b302243 Merge pull request #298 from ESMCI/jgfouca/restore_verbose
0a95b04 component_compare_test should fail if one of the components to be compared is not found"
9aa8683 Reintroduce verbose option into the refactored logging system
a796384 fix for lii test and response to review
d8c9b2e Merge pull request #286 from jedwards4b/buildnml_output_fix
1894e28 improve documetation of debug option, remove incorrect documentation of verbose option
21ef9ae remove whitespace surrounding test names
f6fdbb7 fix erp test update ChangeLog
b8b4723 Fix LII test
daa0c63 rename bisect unit test from acme to cime
4f1a079 move pecount code from create_test to create_newcase
24eb48f move clm include directory to prevent build confusion
1ee4449 add support for ascii testfile, allow multiple compilers in tests
d6c28b4 fix memleak test giving error if baseline not found
8ff5f19 Merge pull request #287 from ESMCI/nag_mismatch
a57e410 Remove `-mismatch_all` from NAG options in CESM.
5040500 output from buildnml scripts now prints

git-subtree-dir: cime
git-subtree-split: 12d2135
jgfouca pushed a commit that referenced this pull request Feb 27, 2018
This merge adds a new ocean grid:
oRRS.30-10.km is an MPAS ocean grid with a mesh density function that is
roughly proportional to the Rossby radius of deformation, with 30 km
gridcells at low and 10 km gridcells at high latitudes.

This is the lower-resolution of our two high-resolution grids, and can
be used for both performance testing and initializing a model for
scientific simulations. The associated initial, domain, and mapping
files have been added to https://acme-svn2.ornl.gov/acme-repo
in revision 135.

* mark-petersen/mpas-o/add_mesh_RRS.30-10.km:
  Update file date stamp to spun-up version of oRRS30to10
  Add new ocean grid, RRS.30-10.km

OG-426

[BFB]
rljacob pushed a commit that referenced this pull request Apr 16, 2021
Update lt_archive and st_archive scripts to fix submission issue.
Update to lt_archive to allow for submission to HPSS queue. st_archive update
for DART and minor changes to XML settings.

Test suite: ERR.f19_g16.B1850W
Test baseline: N/A
Test namelist changes: N/A
Test status: bit-for-bit

Fixes: #308

Code review: Jim
rljacob pushed a commit that referenced this pull request May 6, 2021
Update lt_archive and st_archive scripts to fix submission issue.
Update to lt_archive to allow for submission to HPSS queue. st_archive update
for DART and minor changes to XML settings.

Test suite: ERR.f19_g16.B1850W
Test baseline: N/A
Test namelist changes: N/A
Test status: bit-for-bit

Fixes: #308

Code review: Jim
yunpengshan2014 pushed a commit that referenced this pull request Apr 2, 2024
Add two ERA5 post processing scripts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants