-
Notifications
You must be signed in to change notification settings - Fork 6
Piernik
This page ilustrates MUSCLization of legacy astrophysics code [http://piernik.astri.umk.pl/ PIERNIK] (what stands for "Gingerbread" in polish) developed outside of the MAPPER project.
- single binary image
- FORTRAN2008
- MPI/HDF5 dependency
- high scalability of the main MHD module
- propotype particule simulation Monte Carlo module
- coupling done via global variables
The next sections section describes what steps are needed to transform from existing coupling done via shared memory to one exploiting MUSCLE framework. Here we shortly list expected benefits:
- Both MHD and MC can run concurrently, exchanging data at the begging of each time step. This brings a potential of introducing new level of parallelism, thus resulting in shorter walltimes. In curent code MHD and MC simulations are called sequentially, one after another.
- The previous tests shows that MC code is much more resource demanding while having potential for greater scalability than the MHD part. Using MUSCLE it is possible to assign different number of resources to kernels (e.g. 12 cores for MHC code, 120 cores for MC one)
- the MC code is in process of GPU-enabling, one may want to run both modules on different heterogenous resources (i.e. MC on GPU cluster, base MHD code on Intel Nehalem cluster).
In the end we will try to verify the above hypothesis in production runs.
PIERNIK uses own build system based on Python scripts, all set of flags used for compilation are stored in plain configuration files like this one:
PROG = piernik
USE_GNUCPP = yes
F90 = mpif90
F90FLAGS = -ggdb -fdefault-real-8 -ffree-form -std=gnu -fimplicit-none -ffree-line-length-none
F90FLAGS += -Ofast -funroll-loops
F90FLAGS += -I/software/local/libs/hdf5/1.8.9-pre1/gnu-4.7.2-ompi/include
LDFLAGS = -Wl,--as-needed -Wl,-O1 -L/software/local/libs/hdf5/1.8.9-pre1/gnu-4.7.2-ompi/lib
In order to link with MUSCLE 2.0 we have to alter the last lines of the file:
...
LDFLAGS = -Wl,--as-needed -Wl,-O1 -L/software/local/libs/hdf5/1.8.9-pre1/gnu-4.7.2-ompi/lib -L/mnt/lustre/scratch/groups/plggmuscle/2.0/devel-debug/lib
LIBS = -lmuscle2
Then we build the code with the following command:
# load MUSCLE
module load muscle2/devel-debug
# load PIERN dependencies (HDF5, OpenMPI, newest GNU compiler)
module load plgrid/libs/hdf5/1.8.9-gnu-4.7.2-ompi
./setup mc_collisions_test -c gnufast -d HDF5,MUSCLE,MHD_KERNEL
The MUSCLE
and MHD_KERNEL
stands for preprocessor defines as we want to keep the MUSCLE dependency conditional, we will use them later.
Using this [Fortran API|MUSCLE Fortran tuturial] as reference it was relatively easy to add the following code to the main PIERNIK file (piernik.F90):
#ifdef MUSCLE
call muscle_fortran_init
#endif
call init_piernik
...
call cleanup_piernik
#ifdef MUSCLE
call MUSCLE_Finalize
#endif
...
#ifdef MUSCLE
subroutine muscle_fortran_init()
implicit none
integer :: argc, i, prevlen, newlen
character(len=25600) :: argv
character(len=255) :: arg
prevlen = 0
argc = command_argument_count()
do i = 0, argc
call get_command_argument(i, arg)
newlen = len_trim(arg)
argv = argv(1:prevlen) // arg(1:newlen) // char(0)
prevlen = prevlen + newlen + 1
end do
call MUSCLE_Init(argc, argv(1:prevlen))
end subroutine muscle_fortran_init
#endif
end program piernik
The MUSCLE_Init assumes that application is called with MUSCLE environment so it will always fail if called directly:
$./piernik
(12:29:26 ) MUSCLE port not given. Starting new MUSCLE instance.
(12:29:26 ) ERROR: Could not instantiate MUSCLE: no command line arguments given.
(12:29:26 ) Program finished
At first we need to prepare a simplistic CxA file which describes the simulation, we starts from single kernel and no conduits:
# configure cxa properties
cxa = Cxa.LAST
# declare kernels and their params
cxa.add_kernel('mhd', 'muscle.core.standalone.NativeKernel')
cxa.env["mhd:command"] = "./piernik"
cxa.env["mhd:dt"] = 1
# global params
cxa.env["max_timesteps"] = 4
cxa.env["cxa_path"] = File.dirname(__FILE__)
# configure connection scheme
cs = cxa.cs
Now we are ready to run PIERNIK MHD module in MUSCLE:
$muscle2 --main --cxa piernik.cxa.rb mhd
Running both MUSCLE2 Simulation Manager and the Simulation
### Running MUSCLE2 Simulation Manager
[12:39:05 muscle] Started the connection handler, listening on 10.3.1.22:5000
### Running MUSCLE2 Simulation
[12:39:06 muscle] Using directory </scratch/26934481.batch.grid.cyf-kr.edu.pl/n3-1-22.local_2013-03-09_12-39-05_23596>
[12:39:06 muscle] mhd: connecting...
[12:39:06 muscle] Registered ID mhd
[12:39:06 muscle] mhd conduit entrances (out): []
mhd conduit exits (in): []
[12:39:06 muscle] mhd: executing
(12:39:06 mhd) Spawning standalone kernel: [./piernik]
[n3-1-22.local:23649] mca: base: component_find: unable to open /software/local/OpenMPI/1.6.3/ib/gnu/4.1.2/lib/openmpi/mca_mtl_psm: libpsm_infinipath.so.1: cannot open shared object file: No such file or directory (ignored)
Start of the PIERNIK code. No. of procs = 1
Warning @ 0: [units:init_units] PIERNIK will use 'cm', 'sek', 'gram' defined in problem.par
[units:init_units] cm = 1.3459000E-11 [user unit]
[units:init_units] sek = 3.1688088E-08 [user unit]
[units:init_units] gram = 1.0000000E-22 [user unit]
Starting problem : mctest :: tst
Info @ 0: Working with 2 fluid.
Info @ 0: Number of cells: 1
Info @ 0: Cell volume: 4.1016785411997372E+032
Info @ 0: Monomer mass: 4.2893211697012652E-013
Info @ 0: Temperature: 200.02221228956296
Info @ 0: Number of monomers per one representative particle: 1.3865676291650172E+030
Info @ 0: Dust density [g/cm3]: 2.9000000001269637E-013
Warning @ 0: [initfluids:sanitize_smallx_checks] adjusted smalld to 1.1895E-04
Warning @ 0: [initfluids:sanitize_smallx_checks] adjusted smallp to 1.7705E-01
Info @ 0: Timesteps: 2.0416940798311732E+038 50.000000000000000
Info @ 0: Timesteps: 50.000000000000000 50.000000000000000
Info @ 0: Timesteps: 2.0416940798311732E+038 50.000000000000000
Info @ 0: Timesteps: 50.000000000000000 50.000000000000000
[MC] nstep = 1 dt = 1.5778800002006178E+09 s t = 9.9998257934644172E+01 yr dWallClock = 0.04 s
[MC] Writing output 1 time = 9.9998257934644172E+01 yr = 3.1557600004012356E+09 s
Info @ 0: Timesteps: 2.0416940798311732E+038 50.000000000000000
Info @ 0: Timesteps: 50.000000000000000 50.000000000000000
[MC] nstep = 2 dt = 1.5778800002006178E+09 s t = 1.9999651586928834E+02 yr dWallClock = 0.04 s
[MC] Writing output 2 time = 1.9999651586928834E+02 yr = 6.3115200008024712E+09 s
Info @ 0: Timesteps: 2.0416940798311732E+038 50.000000000000000
Info @ 0: Timesteps: 50.000000000000000 50.000000000000000
[MC] nstep = 3 dt = 1.5778800002006178E+09 s t = 2.9999477380393250E+02 yr dWallClock = 0.11 s
[MC] Writing output 3 time = 2.9999477380393250E+02 yr = 9.4672800012037067E+09 s
Info @ 0: Timesteps: 2.0416940798311732E+038 50.000000000000000
Info @ 0: Timesteps: 50.000000000000000 50.000000000000000
[MC] nstep = 4 dt = 1.5778800002006178E+09 s t = 3.9999303173857669E+02 yr dWallClock = 0.17 s
[MC] Writing output 4 time = 3.9999303173857669E+02 yr = 1.2623040001604942E+10 s
Info @ 0: Timesteps: 2.0416940798311732E+038 50.000000000000000
Info @ 0: Timesteps: 50.000000000000000 50.000000000000000
[MC] nstep = 5 dt = 1.5778800002006178E+09 s t = 4.9999128967322088E+02 yr dWallClock = 0.18 s
[MC] Writing output 5 time = 4.9999128967322088E+02 yr = 1.5778800002006178E+10 s
Info @ 0: Timesteps: 2.0416940798311732E+038 50.000000000000000
Info @ 0: Timesteps: 50.000000000000000 50.000000000000000
[MC] nstep = 6 dt = 1.5778800002006178E+09 s t = 5.9998954760786501E+02 yr dWallClock = 0.77 s
[MC] Writing output 6 time = 5.9998954760786501E+02 yr = 1.8934560002407413E+10 s
Info @ 0: Simulation has reached final time t = 600.000
Finishing ..........
(12:39:08 mhd) Program finished.
(12:39:08 mhd) Command [./piernik] finished.
[12:39:08 muscle] mhd: finished
[12:39:08 muscle] All ID's have finished, quitting MUSCLE now.
[12:39:08 muscle] All local submodels have finished; exiting.
Executed in </scratch/26934481.batch.grid.cyf-kr.edu.pl/n3-1-22.local_2013-03-09_12-39-05_23596>
At first we need to create separate PIERNIK build for the MC kernel
./setup -o mc mc_collisions_test -c gnufast -d HDF5,MUSCLE,MC_KERNEL
Please note that we use -o mc
(use suffix for obj directory) and MC_KERNEL
instead of MHD_KERNEL
. This will create another build in ./obj_mc/piernik
. Now we are ready to add another kernel definition in the CxA file:
cxa.add_kernel('mc', 'muscle.core.standalone.NativeKernel')
cxa.env["mc:command"] = "./piernik" #here we assume that the other kernel is started in obj_mc
cxa.env["mc:dt"] = 1;
Because we need to run every kernel in separate directory, we need to start two MUSCLE instances.
cd obj
time muscle2 --main --bindaddr 127.0.0.1 --bindport 1234 --cxa ../scripts/piernik.cxa.rb mhd &
cd ..
cd obj_mc
time muscle2 --manager 127.0.0.1:1234 --cxa ../scripts/piernik.cxa.rb mc &
wait
So we have now both kernels running with MUSCLE, but get scientific relevant results we must finish the coupling: exchanging gas density and momentum for each cell. At first we have two define conduits in the CxA file:
cs.attach('mhd' => 'mc') {
tie('rho_gas', 'rho_gas')
tie('m_gas', 'm_gas')
}
In the original code, before every MC step the rho_gas and m_gas multidimensional arrays were recomputed, in the MUSCLE variant we replace the code with MUSCLE_Receive calls
call MUSCLE_Receive("rho_gas", rho_gas, size(rho_gas), MUSCLE_DOUBLE) !here we assume that real is always real*8
call MUSCLE_Receive("m_gas", m_gas, size(m_gas), MUSCLE_DOUBLE) !here we assume that real is always real*8
As mentioned before the arrays are multidimensional (4 and 3 dimensional saying more precisely) but taking into consideration fact that both kernels are FORTRAN codes and that FORTRAN multidimensional arrays have continues memory layout we can safely cast it to single dimension.
Now we need to add corresponding MUSCLE_Send
calls to the MHD kernel after the gas density and momentum values are recomputed:
call MUSCLE_Send("rho_gas", rho_gas, size(rho_gas), MUSCLE_DOUBLE) !here we assume that real is always real*8
call MUSCLE_Send("m_gas", m_gas, size(m_gas), MUSCLE_DOUBLE) !here we assume that real is always real*8
Until now we silently omitted one important aspect of the PIERNIK simulaitons: the delta timestep is not constant and depends on the current state of both modules: MHD and MC. In the original code the minimum of the dt,,mc,, and dt,,mhd,, values were always chosen. In order to do the same in the MUSCLE flavor, we must define two new conduits: one for sending dt,,mc,, from MC to MHD and second one for sending back the final dt. The conduits definition looks now as follows:
cs.attach('mhd' => 'mc') {
tie('rho_gas', 'rho_gas')
tie('m_gas', 'm_gas')
tie('dt_final', 'dt_final')
}
cs.attach('mc' => 'mhd') {
tie('dt_mc', 'dt_mc')
}
And the corresponding code:
#ifdef MHD_KERNEL
call time_step(dt)
call MUSCLE_Send("dt_final", dt, dt_len, MUSCLE_DOUBLE)
#else /*MC_KERNEL*/
dt = timestep_mc()
call MUSCLE_Send("dt_mc", dt, dt_len, MUSCLE_DOUBLE)
call MUSCLE_Receive("dt_final", dt, dt_len, MUSCLE_DOUBLE)
#endif
As mentioned PIERNIK is an MPI code and there is one MUSCLE cavet that must be taken into while MUSCLEizing legacy applications: the MUSCLE Send and Receive routines must be called only by rank zero process. Until now we spawned only one process by kernel so no extra effort was needed. Now we will add some additional logic exploiting MPI_Bcast, MPI_Gather and MPI_Scatter calls. We need to to do this whenever MUSCLE send or receive operations are used:
- timestep.F90 time_step routine
if (master) then
ts = set_timer("dt_mc_receive")
call MUSCLE_Receive("dt_mc", dt_mc, dt_mc_len, MUSCLE_DOUBLE)
ts = set_timer("dt_mc_receive")
write (*,"('[MHD] Waiting for dt_mc took: ', f8.3, ' s dt = ', f8.4, ' dt_mc = ', f8.4)") ts, dt, dt_mc
endif
!broadcast dt_mc value
call MPI_Bcast(dt_mc, 1, MPI_DOUBLE_PRECISION, 0, comm, ierr)
- piernik.F90 main routine
#ifdef MHD_KERNEL
call time_step(dt)
if (master) then
call MUSCLE_Send("dt_final", dt, dt_len, MUSCLE_DOUBLE)
endif
#else /*MC_KERNEL*/
dt = timestep_mc()
if (master) then
write (*,"('[MC] sending dt_mc:', es23.16 )") dt
call MUSCLE_Send("dt_mc", dt, dt_len, MUSCLE_DOUBLE)
call MUSCLE_Receive("dt_final", dt, dt_len, MUSCLE_DOUBLE)
endif
!broadcast dt final value
call MPI_Bcast(dt, 1, MPI_DOUBLE_PRECISION, 0, comm, ierr)
#endif
- piernik.F90 send_gas_state routine
call MPI_Gather(rho_gas, size(rho_gas), MPI_DOUBLE_PRECISION, rho_gas_global, size(rho_gas), MPI_DOUBLE_PRECISION, 0, comm, ierr)
call MPI_Gather(m_gas, size(m_gas), MPI_DOUBLE_PRECISION, m_gas_global, size(m_gas), MPI_DOUBLE_PRECISION, 0, comm, ierr)
if (master) then
call MUSCLE_Send("rho_gas", rho_gas_global, %REF(size(rho_gas_global)), MUSCLE_DOUBLE) !here we assume that real is always real*8
call MUSCLE_Send("m_gas", m_gas_global, %REF(size(m_gas_global)), MUSCLE_DOUBLE) !here we assume that real is always real*8
endif
- mc.F90 get_gas_state routine
if (master) then
call MUSCLE_Receive("rho_gas", rho_gas_global, %REF(size(rho_gas_global)), MUSCLE_DOUBLE) !here we assume that real is always real*8
call MUSCLE_Receive("m_gas", m_gas_global, %REF(size(m_gas_global)), MUSCLE_DOUBLE) !here we assume that real is always real*8
endif
call MPI_Scatter(rho_gas_global, size(rho_gas), MPI_DOUBLE_PRECISION, rho_gas, size(rho_gas), MPI_DOUBLE_PRECISION, 0, comm, ierr)
call MPI_Scatter(m_gas_global, size(m_gas), MPI_DOUBLE_PRECISION, m_gas, size(m_gas), MPI_DOUBLE_PRECISION, 0, comm, ierr)
We also need to change in the CxA file kernel implementations:
cxa.add_kernel('mhd', 'muscle.core.standalone.MPIKernel')
...
cxa.add_kernel('mc', 'muscle.core.standalone.MPIKernel')