Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.25deg] update PE layout #214

Closed
minghangli-uni opened this issue Aug 28, 2024 · 13 comments
Closed

[0.25deg] update PE layout #214

minghangli-uni opened this issue Aug 28, 2024 · 13 comments
Assignees

Comments

@minghangli-uni
Copy link
Contributor

minghangli-uni commented Aug 28, 2024

To update the PE layout for the 0.25 deg configuration, nuopc.runconfig, ice_in and config.yaml require corresponding modifications.
Note: These changes will be updated when the configuration is revised.

nuopc.runconfig

PELAYOUT_attributes::
  atm_ntasks = 48
  atm_nthreads = 1
  atm_pestride = 1
  atm_rootpe = 0
  cpl_ntasks = 96
  cpl_nthreads = 1
  cpl_pestride = 1
  cpl_rootpe = 0
  esmf_logging = ESMF_LOGKIND_NONE
  esp_ntasks = 1
  esp_nthreads = 1
  esp_pestride = 1
  esp_rootpe = 0
  glc_ntasks = 1
  glc_nthreads = 1
  glc_pestride = 1
  glc_rootpe = 0
  ice_ntasks = 96
  ice_nthreads = 1
  ice_pestride = 1
  ice_rootpe = 0
  lnd_ntasks = 1
  lnd_nthreads = 1
  lnd_pestride = 1
  lnd_rootpe = 0
  ninst = 1
  ocn_ntasks = 1344
  ocn_nthreads = 1
  ocn_pestride = 1
  ocn_rootpe = 96
  pio_asyncio_ntasks = 0
  pio_asyncio_rootpe = 1
  pio_asyncio_stride = 0
  rof_ntasks = 48
  rof_nthreads = 1
  wav_ntasks = 1
  wav_nthreads = 1
  wav_pestride = 1
  wav_rootpe = 0
::

ice_in

&domain_nml
  block_size_x = 30
  block_size_y = 27
  distribution_type = "roundrobin"
  distribution_wght = "latitude"
  maskhalo_bound = .true.
  maskhalo_dyn = .true.
  maskhalo_remap = .true.
  max_blocks = 20
  ns_boundary_type = "tripole"
  nx_global = 1440
  ny_global = 1080
  processor_shape = "square-ice"
  debug_blocks = True
/

config.yaml

queue: normal
ncpus: 1440
jobfs: 10GB
mem: 5760GB

walltime: 24:00:00
jobname: 025deg_jra55do_ryf

model: access-om3
@minghangli-uni minghangli-uni self-assigned this Aug 28, 2024
@dougiesquire
Copy link
Collaborator

You'll also need to update the config.yaml for the new ncpus and mem

@minghangli-uni
Copy link
Contributor Author

Right, thanks @dougiesquire

@anton-seaice
Copy link
Contributor

I can't remember where we got those block sizes from, we should get better performance if we can reduce max_blocks (say to 10?) by setting the blocksizes differently.

Sorry I was wrong last week - we did put in a patch for max_blocks ... you can remove it from the namelist. Its still good to check the logs to get it to closer to 10.

The process would be - pick number of procs, then set block_size_x & block_size_y such that the blocks are close to square, and there are around 10 per PE (ideally also nx_global is divisible by block_size_x and ny_global is divisible by block_size_y)

We can also remove debug_blocks - but whilst setting the block size it provides useful information

@COSIMA COSIMA deleted a comment Aug 28, 2024
@COSIMA COSIMA deleted a comment Aug 28, 2024
@COSIMA COSIMA deleted a comment Aug 28, 2024
@minghangli-uni
Copy link
Contributor Author

I came across this issue again #156, where I forgot to adjust the pio settings after changing CICE layout.

     pio_numiotasks = 5
     pio_rearranger = 1
     pio_root = 1
     pio_stride = 48

@anton-seaice I understand the calculations, but could you please clarify why the ICE pio settings are configured this way? Will this improve the performance?

The error message isn’t very intuitive, making it difficult for users to realise that they need to modify these parameters when changing the layout.

Can we revert it to the settings used in the 1deg configuration, here https://github.com/ACCESS-NRI/access-om3-configs/blob/2bc6107ef1b195aa62485a5d87c4ba834996d8cc/nuopc.runconfig#L364-L373?

ICE_modelio::
     diro = ./log
     logfile = ice.log
     pio_async_interface = .false.
     pio_netcdf_format = nothing
     pio_numiotasks = 1
     pio_rearranger = 1
     pio_root = 0
     pio_stride = 48
     pio_typename = netcdf4p

@minghangli-uni
Copy link
Contributor Author

I can't remember where we got those block sizes from

The block sizes were adopted from OM2 report, which specifies a CICE5 block size of 30x27, with a square-ice processor shape and roundrobin distribution type.

Its still good to check the logs to get it to closer to 10.

I cant remember why having the number of blocks close to 10?

@anton-seaice
Copy link
Contributor

@anton-seaice I understand the calculations, but could you please clarify why the ICE pio settings are configured this way? Will this improve the performance?

In the old COSIMA TWG minutes from OM2 development (on the COSIMA website) the recommendation from NCI was to use one task per node. I think the Yang 2019 on Parallel I/O in MOM5 makes similar suggestion ? I guess there is a hardware benefit to one task per node. There's so many options there its hard to know what the best combination is without lots of work. e.g. we could also test having a dedicated IO PE, or changing the PIO_rearranger

I think one IO task per node is a good start. We could try just one IO task, it might not make much difference at this resolution.

The error message isn’t very intuitive, making it difficult for users to realise that they need to modify these parameters when changing the layout.

I agree, does it make a seperate ESMF log file ? I think they have names something like PETXX.ESMF...

  • It possible there are options in the ESMF build to change how the logging is done.
  • A good thing to do would be to check in payu if the PE layout fits within the request compute resources

The block sizes were adopted from OM2 report, which specifies a CICE5 block size of 30x27, with a square-ice processor shape and roundrobin distribution type.

Ok thanks!

I cant remember why having the number of blocks close to 10?

From the cice docs :

Smaller, more numerous blocks provides an opportunity for better load balance by allocating each processor both ice-covered and ice-free blocks. But smaller, more numerous blocks becomes less efficient due to MPI communication associated with halo updates. In practice, blocks should probably not have fewer than about 8 to 10 grid cells in each direction, and more square blocks tend to optimize the volume-to-surface ratio important for communication cost. Often 3 to 8 blocks per processor provide the decompositions flexiblity to create reasonable load balance configurations.

So we should actually aim for number of blocks of 8 or less by the sounds of it :)

@minghangli-uni
Copy link
Contributor Author

minghangli-uni commented Aug 29, 2024

I think one IO task per node is a good start. We could try just one IO task, it might not make much difference at this resolution.

I agree for the current phase. I will do a test on the I/O tasks to verify the optimal configuration.

does it make a seperate ESMF log file ? I think they have names something like PETXX.ESMF...

This can be enabled by setting thiscreate_esmf_pet_files to true in drv_in, but this should be used mostly for debugging purposes, not in production runs.
And would it be helpful to add a comment after the PE setup for ice_ntasks to reference this issue?
For example: ice_ntasks = 96 #NB: Parallel I/O github.com/COSIMA/access-om3/issues/214 This would inform users meeting the issue about the current setup, and we can remove the comment once the I/O is optimised.

So we should actually aim for number of blocks of 8 or less by the sounds of it :)

The updated settings result in a max_blocks of 5, in the range of 3-8 blocks per processor, which aligns with CICE docs.

&domain_nml
  block_size_x = 60
  block_size_y = 54
  distribution_type = "roundrobin"
  distribution_wght = "latitude"
  maskhalo_bound = .true.
  maskhalo_dyn = .true.
  maskhalo_remap = .true.
  max_blocks = -1
  ns_boundary_type = "tripole"
  nx_global = 1440
  ny_global = 1080
  processor_shape = "square-ice"
/

@minghangli-uni
Copy link
Contributor Author

When setting max_blocks = -1 with the roundrobin distribution type, the max_blocks prescribed by CICE does not always match the actual number of ice blocks. E.g., with the above configuration, max_blocks is set to 6, but the log shows a warning:

 534   block_size_x,_y       =     60    54
 535   max_blocks            =      6
 536   Number of ghost cells =      1
 537
 538  (ice_read_global_nc) min, max, sum =   -1.41413909065909
 539    1.57079632679490        154674.873407807      ulat
 540  (ice_read_global_nc) min, max, sum =   0.000000000000000E+000
 541    1.00000000000000        969809.000000000      kmt
 542  ice_domain work_unit, max_work_unit =        28035          10
 543  ice_domain nocn =            0      280343    44787740
 544  ice_domain work_per_block =            0          11        2204
 545  ice: total number of blocks is         391
 546   ********WARNING***********
 547  (init_domain_distribution)
 548   WARNING: ice no. blocks too large: decrease max to           5

Despite this warning, I don’t believe it will impact overall performance since MOM typically has a much higher computational load than CICE.

NB:
max_blocks = -1 with the rake distribution type fails.

@anton-seaice
Copy link
Contributor

Why do you think max_blocks shouldn't be 5 ?

@minghangli-uni
Copy link
Contributor Author

It can be 5, but we have to manually modify it to be 5

@anton-seaice
Copy link
Contributor

anton-seaice commented Aug 29, 2024

Oh sorry, I see now. That's something about the patch we put into access-om3 0.3.x for removing max_blocks, and the max_blocks calculation being approximate. When we update the cice version it should go away (after CICE-Consortium/CICE#954)

It will allocate ~20% more memory than it uses , but it uses a small enough amount of memory there probably isn't a performance impact.

@anton-seaice
Copy link
Contributor

I created payu-org/payu#496 to add checks for the iolayout numbers

@anton-seaice
Copy link
Contributor

Closed through ACCESS-NRI/access-om3-configs#114

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants