New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

MPI #141

Open

valeriaRaffuzzi wants to merge 28 commits into CambridgeNuclear:main from valeriaRaffuzzi:mpi

Member

valeriaRaffuzzi commented Nov 15, 2024 •

edited

Loading

Finally! I think this is ready to go.

This PR includes MPI support. The main differences are in the tallies and in the dungeon for population normalisation between cycles and load balancing (from Paul Romano's PhD).

In the tallies, one key change is that a report was added: closeCycle other than reportCycleEnd. With MPI, there are two options: mpiSync 1 means that the tallies are synchronised each cycle; mpiSync 0 means that the tallies from each rank are collected at the end of the calculation. All calculations give identical results when mpiSync 1; they are within statistics when mpiSync 0. NOTE that this option applies to all tallies included in a tally admin rather to individual clerks. Splitting reportCycleEnd into two procedures (i.e., adding closeCycle), makes reproducibility easier for most tallyClerks.

In the dungeon, population normalisation was implemented using a new data structure, heapQueue. Note that to ensure reproducibility, particles have to be sorted before and after sampling. Then, load balancing is performed transferring particles between processes.

Results seem to be reproducible for all tallies, and all the tests pass successfully.

@Mikolaj-A-Kowalski However the github tests seem to fail during compilation because the MPI library is not available. Do you know if there's an easy solution to this? Otherwise I'll have a look.

Mikolaj-A-Kowalski and others added 26 commits

September 30, 2024 12:11


          Add id to track the source particle index

f6d93af


          Initial implementation of the dungeon sort

b1f12f8


          Add range checks to the sorting of particleDungeon

04b37e1

Determine the maximum value of the key without implicit assumptions.


          Rename showerID to broodID

1ad1ab8


          Apply PR comments

39610a1


          Remove reuse of random numbers in first flights

265a872

We were reusing first few random numbers following the source generation
(state of RNG was the same at the beginning of sampling the source
particle and its first flight). This commit moves the RNG back before
the source generation is performed, thus preventing the reuse.


          Add MPI to the compilation

c328a21

No communication takes place at this stage.


          Explicitly reduce bins before they are closed

fa1702d


          Add MPI reduction of score bins

3f1a34f


          Initial MPI support in fixed source calculation

911c2b9

Will be reproducable if fixes to source_init are merged.


          Add display_func module

d21f4ab

Allows to limit the console I/O to the master process only. Applied
to fixed source calculation only at the moment.


          Add heap queue data structure

4acb140

It will be used for sampling without replacement.


          Remove circular dependencies

6ff2f01


          Allow to get the current state of the RNG

0392e76


          Fix reproducibility of MPI FS simulation

ee97c70


          Fix MPI free compilation

adc1e62


          Make the MPI sync optional for scoreMemory

a233e0c

Results from not master processes are not combined, hence they are lost
at the moment.


          Add default normalisation for results

b3bd9c1

Fixes a bug for synchronised scoreMemory. The buffer value after
transfer in parallelBin was not properly set to 0 again.


          Add option to collect distrubuted tally results

12cbdb6


          Use statusMsg for display output of tallyClerks

35ef86d


          Make eigenPP work in MPI calculations

e02b5d0

It is not reproducable at the moment


          Adding MPI to work with eigenPP

c03350c


          Adjust tallies to be compatible with MPI

d5d1853


          Fix shannon entropy clerk with MPI

b4fc455


          Update MPI branch to main latest and allow tests to run with MPI

2b05044


          Making tallies reproducible with mpi synchronisation

6a1eb86

valeriaRaffuzzi requested review from ChasingNeutrons and Mikolaj-A-Kowalski

November 15, 2024 19:07

ChasingNeutrons reviewed

View reviewed changes

PhysicsPackages/eigenPhysicsPackage_class.f90 Outdated

+                use mpi_func,                       only : isMPIMaster, getWorkshare, getOffset, getMPIRank
+              #ifdef MPI
+                use mpi_func,                       only : MASTER_RANK, MPI_Bcast, MPI_INT, MPI_COMM_WORLD, &
+                                                           MPI_DOUBLE, mpi_reduce, MPI_SUM

Collaborator

ChasingNeutrons Dec 5, 2024

Suggested change

      
                                                         MPI_DOUBLE, mpi_reduce, MPI_SUM
          
                                                         MPI_DOUBLE, MPI_REDUCE, MPI_SUM

ChasingNeutrons reviewed

View reviewed changes

PhysicsPackages/eigenPhysicsPackage_class.f90 Outdated

@@ @@ -277,22 +308,29 @@ subroutine cycles(self, tally, tallyAtch, N_cycles) @@
                     end_T = real(N_cycles,defReal) * elapsed_T / i
                     T_toEnd = max(ZERO, end_T - elapsed_T)
+              #ifdef MPI
+                    ! Print the population numbers referred to all processes to screen
+                    call mpi_reduce(nStart, nTemp, 1, MPI_INT, MPI_SUM, MASTER_RANK, MPI_COMM_WORLD, error)

Collaborator

ChasingNeutrons Dec 5, 2024

Suggested change

      
                  call mpi_reduce(nStart, nTemp, 1, MPI_INT, MPI_SUM, MASTER_RANK, MPI_COMM_WORLD, error)
          
                  call MPI_REDUCE(nStart, nTemp, 1, MPI_INT, MPI_SUM, MASTER_RANK, MPI_COMM_WORLD, error)

ChasingNeutrons reviewed

View reviewed changes

PhysicsPackages/eigenPhysicsPackage_class.f90 Outdated

+                                                           printSectionEnd, printSeparatorLine
+                use mpi_func,                       only : isMPIMaster, getWorkshare, getOffset, getMPIRank
+              #ifdef MPI
+                use mpi_func,                       only : MASTER_RANK, MPI_Bcast, MPI_INT, MPI_COMM_WORLD, &

Collaborator

ChasingNeutrons Dec 5, 2024

Are we consistent with how we write MPI commands? Should they all be all caps or not? If they should then we should have

Suggested change

      
              use mpi_func,                       only : MASTER_RANK, MPI_Bcast, MPI_INT, MPI_COMM_WORLD, &
          
              use mpi_func,                       only : MASTER_RANK, MPI_BCAST, MPI_INT, MPI_COMM_WORLD, &

ChasingNeutrons reviewed

View reviewed changes

PhysicsPackages/eigenPhysicsPackage_class.f90 Outdated

+                    ! Print the population numbers referred to all processes to screen
+                    call mpi_reduce(nStart, nTemp, 1, MPI_INT, MPI_SUM, MASTER_RANK, MPI_COMM_WORLD, error)
+                    nStart = nTemp
+                    call mpi_reduce(nEnd, nTemp, 1, MPI_INT, MPI_SUM, MASTER_RANK, MPI_COMM_WORLD, error)

Collaborator

ChasingNeutrons Dec 5, 2024

Suggested change

      
                  call mpi_reduce(nEnd, nTemp, 1, MPI_INT, MPI_SUM, MASTER_RANK, MPI_COMM_WORLD, error)
          
                  call MPI_REDUCE(nEnd, nTemp, 1, MPI_INT, MPI_SUM, MASTER_RANK, MPI_COMM_WORLD, error)

ChasingNeutrons reviewed

View reviewed changes

PhysicsPackages/eigenPhysicsPackage_class.f90 Outdated

@@ @@ -262,6 +288,11 @@ subroutine cycles(self, tally, tallyAtch, N_cycles) @@
                     end select
+              #ifdef MPI
+                    ! Broadcast k_eff obtained in the master to all processes
+                    call MPI_Bcast(k_new, 1, MPI_DOUBLE, MASTER_RANK, MPI_COMM_WORLD)

Collaborator

ChasingNeutrons Dec 5, 2024

Suggested change

      
                  call MPI_Bcast(k_new, 1, MPI_DOUBLE, MASTER_RANK, MPI_COMM_WORLD)
          
                  call MPI_BCAST(k_new, 1, MPI_DOUBLE, MASTER_RANK, MPI_COMM_WORLD)

ChasingNeutrons reviewed

View reviewed changes

PhysicsPackages/eigenPhysicsPackage_class.f90 Outdated

+                  ! Broadcast seed to all processes
+              #ifdef MPI
+                  call MPI_Bcast(seed_temp, 1, MPI_INT, MASTER_RANK, MPI_COMM_WORLD)

Collaborator

ChasingNeutrons Dec 5, 2024

Suggested change

      
                call MPI_Bcast(seed_temp, 1, MPI_INT, MASTER_RANK, MPI_COMM_WORLD)
          
                call MPI_BCAST(seed_temp, 1, MPI_INT, MASTER_RANK, MPI_COMM_WORLD)

ChasingNeutrons reviewed

View reviewed changes

PhysicsPackages/fixedSourcePhysicsPackage_class.f90 Outdated

+                                                           printSeparatorLine
+                use mpi_func,                       only : isMPIMaster, getWorkshare, getOffset, getMPIRank
+              #ifdef MPI
+                use mpi_func,                       only : MASTER_RANK, MPI_Bcast, MPI_INT, MPI_COMM_WORLD

Collaborator

ChasingNeutrons Dec 5, 2024

Suggested change

      
              use mpi_func,                       only : MASTER_RANK, MPI_Bcast, MPI_INT, MPI_COMM_WORLD
          
              use mpi_func,                       only : MASTER_RANK, MPI_BCAST, MPI_INT, MPI_COMM_WORLD

ChasingNeutrons reviewed

View reviewed changes

PhysicsPackages/fixedSourcePhysicsPackage_class.f90 Outdated

+                  ! Broadcast seed to all processes
+              #ifdef MPI
+                  call MPI_Bcast(seed_temp, 1, MPI_INT, MASTER_RANK, MPI_COMM_WORLD)

Collaborator

ChasingNeutrons Dec 5, 2024

Suggested change

      
                call MPI_Bcast(seed_temp, 1, MPI_INT, MASTER_RANK, MPI_COMM_WORLD)
          
                call MPI_BCAST(seed_temp, 1, MPI_INT, MASTER_RANK, MPI_COMM_WORLD)

ChasingNeutrons reviewed

View reviewed changes

SharedModules/errors_mod.f90 Outdated

@@ @@ -9,6 +9,7 @@ module errors_mod @@
                 use numPrecision
                 use universalVariables, only : MAX_COL
+                !use mpi_func,           only : getMPIRank

Collaborator

ChasingNeutrons Dec 5, 2024

Delete this?

ChasingNeutrons reviewed

View reviewed changes

SharedModules/errors_mod.f90 Outdated

@@ @@ -36,7 +37,9 @@ subroutine fatalError(where, why) @@
                   ! Upper frame
                   write(error_unit, *) repeat('<>', MAX_COL / 2)
+              ! #ifdef MPI

Collaborator

ChasingNeutrons Dec 5, 2024

Delete?

ChasingNeutrons reviewed

View reviewed changes

SharedModules/mpi_func.f90 Outdated

+                !! It is necessary for load balancing in the dungeon: particles have to be
+                !! transferred betwen processes, and MPI doesn't allow to transfer types with
+                !! type-bound procedures
+                type, public :: particleStateDummy

Collaborator

ChasingNeutrons Dec 5, 2024

This is so painful :( Are there no alternatives around this? It feels like a nasty bug waiting to happen!

Member Author

valeriaRaffuzzi Dec 8, 2024

I don't think so. We need to pass particle states between ranks. The alternative is to pass individual properties but I don't see how that's less painful.

ChasingNeutrons reviewed

View reviewed changes

SharedModules/mpi_func.f90

+              #ifdef MPI
+                  integer(shortInt)        :: ierr
+                  call mpi_init(ierr)

Collaborator

ChasingNeutrons Dec 5, 2024

As before, I think we need to be consistent with how we call MPI functions. Maybe I have missed something.

Member Author

valeriaRaffuzzi Dec 8, 2024

We should. I am not sure if @Mikolaj-A-Kowalski had a plan when starting MPI. We could use full caps for the in-build MPI functions, like MPI_REDUCE and MPI_BCAST, and normal SCONE capitalisation for the SCONE procedures inside mpi_func.

Collaborator

Mikolaj-A-Kowalski Dec 17, 2024

Yep... massive delay here.
I am generally fully in the team small_letters with regards to all function calls. Capitals should just indicate constants etc.
I would just argue that the ALL CAPS is a bit of a legacy thing from the 80s. We should just treat MPI procedures like we do the intrinsic procedures and keep them small.

Member Author

valeriaRaffuzzi Dec 17, 2024

I think all the procedures should now be in small letters. The types however are all caps, as well as MPI inbuilt stuff like MPI_COMM_WORLD. I could make it all small if we prefer, that's not a problem.

ChasingNeutrons reviewed

View reviewed changes

docs/Running.rst

+              After installation of all the dependencies and the compilation of SCONE,
+              one can finally run the code. The executable ``scone.out`` is in the ``build``
+              folder; one can run it from any folder by specifying the path to the executables
+              and to the input file::

Collaborator

ChasingNeutrons Dec 5, 2024

Perhaps it would be good to clearly describe the interaction between OpenMP and MPI?

Collaborator

ChasingNeutrons commented Dec 5, 2024

To get the tests passing, I think it's a case of including MPI installation in the docker image? mikolajkowalski/scone-test

valeriaRaffuzzi force-pushed the mpi branch 2 times, most recently from a978577 to 3fb146e Compare

December 17, 2024 14:49

Member Author

valeriaRaffuzzi commented Dec 17, 2024

After a long time spent fighting with the compiler..... he is still winning. The problem is related to the Docker container though, rather than to SCONE. OMPI doesn't like it when we try to run the tests as root-user in the container (this refers to the unitTests). I managed to force it with gfortran9 and gfortran10 (which, as you can see, run ok). However I couldn't find a way to get gfortran8 to agree to run!
The latest is that I commented out gfortran8 to make sure that everything else was working correctly, which it is. If we want to reinstate gfortran8, I welcome ideas!

Member Author

valeriaRaffuzzi commented Dec 17, 2024

Another thing to notice is that in the particleDungeon, the previous sampling w/o replacement algorithm is in a new procedure called samplingWithoutReplacement, and it isn't used.
I didn't want to delete it completely because it's nice code, and it might be useful in other contexts in the future? If you both agree I can however delete it since now it is redundant.

Collaborator

Mikolaj-A-Kowalski commented Dec 17, 2024

After a long time spent fighting with the compiler..... he is still winning. The problem is related to the Docker container though, rather than to SCONE. OMPI doesn't like it when we try to run the tests as root-user in the container (this refers to the unitTests). I managed to force it with gfortran9 and gfortran10 (which, as you can see, run ok). However I couldn't find a way to get gfortran8 to agree to run! The latest is that I commented out gfortran8 to make sure that everything else was working correctly, which it is. If we want to reinstate gfortran8, I welcome ideas!

The reason here is most probably that the gfotran-8 image is based on the older Debian (buster), which has version of OpenMPI (3.1) before they added the environmental variables option in 4.0.

In general we should probably push up the compiler versions for CI and maybe drop gfortran-8 :-/
I would prefer it done in a separate PR for clarity. And in the same PR the newer compilers should be added.

If we want to keep gfotran-8 we could just try MPICH?

Collaborator

Mikolaj-A-Kowalski commented Dec 17, 2024

Another thing to notice is that in the particleDungeon, the previous sampling w/o replacement algorithm is in a new procedure called samplingWithoutReplacement, and it isn't used. I didn't want to delete it completely because it's nice code, and it might be useful in other contexts in the future? If you both agree I can however delete it since now it is redundant.

No reason to keep dead code. It will be preserved in the Git history if anyone ever wants to inspect it.

Member Author

valeriaRaffuzzi commented Dec 17, 2024

The reason here is most probably that the gfotran-8 image is based on the older Debian (buster), which has version of OpenMPI (3.1) before they added the environmental variables option in 4.0.

In general we should probably push up the compiler versions for CI and maybe drop gfortran-8 :-/ I would prefer it done in a separate PR for clarity. And in the same PR the newer compilers should be added.

If we want to keep gfotran-8 we could just try MPICH?

Yes that makes sense, about the older OMPI version (annoying!). I agree about dropping gfortran 8 and adding newer versions (all this in a separate PR). But in this case, the problem remains that I can't get the tests to pass in this PR.... I will surely manage with more work but I wonder if it's worth spending time on this.
Could try MPICH, but just to then drop it in the next PR?

Collaborator

Mikolaj-A-Kowalski commented Dec 17, 2024

The reason here is most probably that the gfotran-8 image is based on the older Debian (buster), which has version of OpenMPI (3.1) before they added the environmental variables option in 4.0.
In general we should probably push up the compiler versions for CI and maybe drop gfortran-8 :-/ I would prefer it done in a separate PR for clarity. And in the same PR the newer compilers should be added.
If we want to keep gfotran-8 we could just try MPICH?

Yes that makes sense, about the older OMPI version (annoying!). I agree about dropping gfortran 8 and adding newer versions (all this in a separate PR). But in this case, the problem remains that I can't get the tests to pass in this PR.... I will surely manage with more work but I wonder if it's worth spending time on this. Could try MPICH, but just to then drop it in the next PR?

We can just make the new PR quick (yes... I know) and then rebase this one on the main with new compilers when it is merged. This one can be left without gfotran-8 for now.

valeriaRaffuzzi marked this pull request as draft

December 23, 2024 14:03


          Addressing reviews and fixing github tests

7b36bcc

valeriaRaffuzzi force-pushed the mpi branch from 35fff13 to 7b36bcc Compare

December 23, 2024 15:06

valeriaRaffuzzi marked this pull request as ready for review

December 23, 2024 15:07


          Merge branch 'main' into mpi

74af031

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet