Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallelization for mmf groundwater scheme #139

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

CharlesZheZhang
Copy link
Collaborator

Adding parallelization capacity for mmf groundwater scheme
Code modifications done by Gonzalo (gonzalo.miguez@usc.es) and Prasanth (prasanth@iisertvm.ac.in),
tested and implemented by Zhe (zhezhang@ucar.edu)

Two codes are changed under noahmp directory:
/src/GroundWaterMmfMod.F90
/drivers/hrldas/NoahmpGroundwaterInitMod.F90

Essentially use mpp_land in Lateralflow subroutine and groundwater init subroutine:
https://github.com/CharlesZheZhang/noahmp/blob/30d661ab921372103dbd73da44a23d373cbb2618/drivers/hrldas/NoahmpGroundwaterInitMod.F90#L22
These lines moved to under subroutine NoahmpGroundwaterInitMain(grid, NoahmpIO)

Also let all distributed tiles communicate when reaching equilibrium within the 500 loops of lateralflow calls:
https://github.com/CharlesZheZhang/noahmp/blob/30d661ab921372103dbd73da44a23d373cbb2618/drivers/hrldas/NoahmpGroundwaterInitMod.F90#L123

@cenlinhe
Copy link
Collaborator

cenlinhe commented Aug 16, 2024

Thank you, Zhe, for testing.

  1. Are the results from this parallelization run with multiple cpus the same as the results from single cpu run?
  2. How much faster the parallelization run compared to single core run? (to make sure the parallelization code is indeed working).

@CharlesZheZhang
Copy link
Collaborator Author

Thank you, Zhe, for testing.

  1. Are the results from this parallelization run with multiple cpus the same as the results from single cpu run?
  2. How much faster the parallelization run compared to single core run? (to make sure the parallelization code is indeed working).

The results are identical - I run ncdiff to see the difference from two runs are 0.
For speed, it depends on domain size and how many cpu are used.
For a 600 x 600 domain, with 16 cpus, the mpi run time is 3/8 of the run time of single cpu run.

@barlage
Copy link
Collaborator

barlage commented Aug 16, 2024

@CharlesZheZhang I suggest using something like nccmp to do the comparison, I think this is a more robust check and it is standard in regression testing. With options like:

nccmp -dsSqf file1.nc file2.nc

should report: Files "file1.nc" and "file2.nc" are identical.

If there are any data differences it will report stats on them.

#endif
#ifdef MPP_LAND
rcount=float(ncount)
call sum_real1(rcount)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any issue here when using different precision? since sum_real1 is a single precision function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just write a new sum_integer function?

@@ -225,8 +225,7 @@ subroutine LATERALFLOW (NoahmpIO, ISLTYP,WTD,QLAT,FDEPTH,TOPO,LANDMASK,DELTAT,A
! USE NOAHMP_TABLES, ONLY : DKSAT_TABLE

#ifdef MPP_LAND
! MPP_LAND only for HRLDAS Noah-MP/WRF-Hydro - Prasanth Valayamkunnath (06/10/2022)
use module_mpp_land, only: mpp_land_com_real, mpp_land_com_integer, global_nx, global_ny, my_id
use module_mpp_land !, only: mpp_land_lr_com, mpp_land_ub_com, mpp_land_sync, global_nx, global_ny, my_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the comment if not needed

jtsh=max(jts,jds+1)
jteh=min(jte,jde-2)
jteh=min(jte,jde-1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these don't change answers for non-parallel runs?

itsh=its
iteh=ite
jtsh=jts
jteh=jte
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these don't change answers for non-parallel runs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. I am also curious about this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the LIS group, we actually did something similar to these indices when adding MMF parallel into Noah-MP-4.0.1.

See this code change:
https://github.com/NASA-LIS/LISF/pull/1418/files#diff-7737fa95c763c3f4c44635b9bf558d8a063138b94eb0e208fc2209d2ed65ef3e

Scroll down to Lines 329 to 338 in the new code.

We haven't merged this PR yet, but will be doing it soon.

@CharlesZheZhang
Copy link
Collaborator Author

Thanks @barlage , good suggestion.
I used the nccmp function and the return shows output from the single and mpi run are identical
Files "output_single/1980010501.LDASOUT_DOMAIN1" and "output_mpi/1980010501.LDASOUT_DOMAIN1" are identical.

@barlage
Copy link
Collaborator

barlage commented Aug 16, 2024

OK, maybe I misunderstood something here. Some of my comments are based on whether the new code reproduces the old code, i.e., are these non-answer changing modifications? So, there should probably be four tests:

  1. old serial compared to old mpi (presumably this was the same, but good to check)
  2. new serial compared to new mpi (this seems to be what you checked already)
  3. old serial compared to new serial (answer changing for non-mpi?)
  4. old mpi compared to new mpi (answer changing for mpi?)

@cenlinhe
Copy link
Collaborator

OK, maybe I misunderstood something here. Some of my comments are based on whether the new code reproduces the old code, i.e., are these non-answer changing modifications? So, there should probably be four tests:

  1. old serial compared to old mpi (presumably this was the same, but good to check)
  2. new serial compared to new mpi (this seems to be what you checked already)
  3. old serial compared to new serial (answer changing for non-mpi?)
  4. old mpi compared to new mpi (answer changing for mpi?)

There is no old mpi because originally MMF in HRLDAS cannot be run in a mpi mode.

@cenlinhe
Copy link
Collaborator

I agree that you should also test old code serial compared to new code serial

@barlage
Copy link
Collaborator

barlage commented Aug 16, 2024

@cenlinhe OK, that's what I thought, but that makes me a little confused as to why there is MPP_LAND in the current code? For example, here.

@barlage
Copy link
Collaborator

barlage commented Aug 16, 2024

Given the removal of DM_PARALLEL, does there also need to be WRF and MPAS tests?

@cenlinhe
Copy link
Collaborator

@cenlinhe OK, that's what I thought, but that makes me a little confused as to why there is MPP_LAND in the current code? For example, here.

This was added by Prasanth recently, which however does not work correctly. That is why we asked Gonzalo to fix the issue based on Prasanth's version.

@cenlinhe
Copy link
Collaborator

Given the removal of DM_PARALLEL, does there also need to be WRF and MPAS tests?

good point. I haven't thought about this. maybe we should use both DM_PARALLEL and MPP_LAND if we do not want to have two MMF code versions in HRLDAS and WRF/MPAS? I am not even sure about if current MPAS can run with MMF in parallel given the unstructured grids (I am not familiar with the parallelization of MPAS).

@barlage
Copy link
Collaborator

barlage commented Aug 16, 2024

Given the removal of DM_PARALLEL, does there also need to be WRF and MPAS tests?

good point. I haven't thought about this. maybe we should use both DM_PARALLEL and MPP_LAND if we do not want to have two MMF code versions in HRLDAS and WRF/MPAS? I am not even sure about if current MPAS can run with MMF in parallel given the unstructured grids (I am not familiar with the parallelization of MPAS).

OK, I assume this probably doesn't work in MPAS yet. The use of HRLDAS specific (or any parent system) code in noahmp code is generally not a good idea. This MPP_LAND directive should probably be changed to something like MPI_HRLDAS or DM_PARALLEL_HRLDAS or something more clearly associated with HRLDAS.

Now I'm questioning whether these changes should be in the general noahmp code. First, I really hate #ifdef directives since it makes the code difficult to read/understand. Second, how will this parallelization fit into other parent models? Thinking of how this fits into @dmocko 's comment, do additional parent models just keep adding more ifdefs?

I know you are just trying to get parallel code committed but there are some bigger questions here for these non-column model processes.

@CharlesZheZhang
Copy link
Collaborator Author

Given the removal of DM_PARALLEL, does there also need to be WRF and MPAS tests?

good point. I haven't thought about this. maybe we should use both DM_PARALLEL and MPP_LAND if we do not want to have two MMF code versions in HRLDAS and WRF/MPAS? I am not even sure about if current MPAS can run with MMF in parallel given the unstructured grids (I am not familiar with the parallelization of MPAS).

OK, I assume this probably doesn't work in MPAS yet. The use of HRLDAS specific (or any parent system) code in noahmp code is generally not a good idea. This MPP_LAND directive should probably be changed to something like MPI_HRLDAS or DM_PARALLEL_HRLDAS or something more clearly associated with HRLDAS.

Now I'm questioning whether these changes should be in the general noahmp code. First, I really hate #ifdef directives since it makes the code difficult to read/understand. Second, how will this parallelization fit into other parent models? Thinking of how this fits into @dmocko 's comment, do additional parent models just keep adding more ifdefs?

I know you are just trying to get parallel code committed but there are some bigger questions here for these non-column model processes.

Hi Mike and Cenlin,
I will do the single and mpi run to compare with old codes.
MPAS-A can run with Noah-MP in v8.2.1, but I haven't run with MMF groundwater yet.
Will let you know soon with some answers.

@cenlinhe
Copy link
Collaborator

Yes, we need to find a way to make sure these changes work with all different parent models.

@CharlesZheZhang
Copy link
Collaborator Author

OK, maybe I misunderstood something here. Some of my comments are based on whether the new code reproduces the old code, i.e., are these non-answer changing modifications? So, there should probably be four tests:

  1. old serial compared to old mpi (presumably this was the same, but good to check)
  2. new serial compared to new mpi (this seems to be what you checked already)
  3. old serial compared to new serial (answer changing for non-mpi?)
  4. old mpi compared to new mpi (answer changing for mpi?)

Hi Mike and Cenlin, thank you for your suggestions. I have conducted a some tests with different hrldas/noahmp versions and both single and mpi run (v4.5, master, and the new mpi in this commit).

The short summary for the tests is - the master code has very minimum changes from the v4.5 only on the upper domain boundary, and the new mpi code is identical to the master code in both single run and mpi run. The mpi run (with 16 cpu is much faster, only 3/8 of the single cpu run time).

Three versions are used:
1: hrldas/noahmp v4.5 (this is the pre-refactor code and consistent with WRFv4.5, no mpi capability in hrldas)
2: hrldas/noahmp master (this is the current version, mpi code implemented by Prasanth but can’t pass groundwater init subroutine)
3: hrldas/noahmp new mpi (this is the new mpi version implemented by Gonzalo this summer)

Four simulations are run:
1: V4.5 single run
2: Master single run
3: New mpi code, run with single cpu
4: New mpi code, run with 16 cpu parallelization

The v4.5 single cpu run shows minor differences results from the master version and the new mpi, the differences are all in the upper boundary of the domain. For the master version and the new mpi version, results are identical.

Please see my note in shared google drive:
https://docs.google.com/document/d/1BB8V6ncdxU5XfGrQc48MyjkSYoTdO-9uZz8RHSMLg9o/edit?usp=sharing

@cenlinhe
Copy link
Collaborator

Thank you, Zhe! This means the MPI fix implemented by Gonzalo is correct. We will then need to think about an effective way for the MPI flag so that it can be compatible with different host models.

@barlage
Copy link
Collaborator

barlage commented Aug 20, 2024

@CharlesZheZhang are the differences between 1 and 2 expected? This could indicate that there is a problem in the new implementation (or that there was a problem in the old code).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants