-
Notifications
You must be signed in to change notification settings - Fork 686
How to Check Bit Wise Identical Results
Running WRF with different number of processors should yield bit-wise identical results. A brief description of the approach to check the results is given below.
First, WRF must be compiled with no optimization. This is because optimization combined with parallelization may interfere the simulation, leading to non-bit wise results.
- In top WRF directory, type ./clean -a, then ./configure -d, and select the dmpar mode to compile WRF. The ‘-d’ option removes optimization in the configure file.
- Repeat the above procedure, but select smpar mode to compile WRF
- The executable files (i.e., wrf.exe) produced for different compiling modes must be saved separately (example: they can be saved as wrf_mpi.exe and wrf_openmp.exe)
- Create a small test case, running WRF in serial, openMP and MPI modes for 10 time steps, respectively.
- Running WRF in serial mode: : mpirun -np 1 ./wrf.exe
- Running WRF in MPI mode: mpirun -np 4 ./wrf.exe (this command may vary depending on different machines.)
- Running WRF in openMP model: export OMP_NUM_THREADS = 2 , then ./wrf.exe
- History output file at the last time step (the 10th time step) must be saved for each individual run (example: wrfout files can be renamed as wrfout_serial, wrfout_mpi, and wrfout_openMP).
WRF provides a utility for bit-wise check, which is created once WRF is compiled successfully. This utility is located at /WRF/external/io_netcdf/. To check whether the results from the serial mode, MPI mode and OpenMP mode are bit-wise identical, issue the following command:
- /WRF/external/io_netcdf/diffwrf wrfout_serial wrfout_mpi
- /WRF/external/io_netcdf/diffwrf wrfout_serial, wrfout_openMP
The above two commands should yield the messages below:
Just plot F
Diffing wrfout_serial wrfout_mpi
Next Time 2017-02-01_01:00:00
Field Ndifs Dims RMS (1) RMS (2) DIGITS RMSE pntwise max
And
Just plot F
Diffing wrfout_serial wrfout_openMP
Next Time 2017-02-01_01:00:00
Field Ndifs Dims RMS (1) RMS (2) DIGITS RMSE pntwise max
First, figure out when and which variable becomes different. This often involves rerunning the case and save history output files at every time step and check the output using diffwrf. Then trace back to find possible errors and fix the problem. Some of the common issues that often leads to non-bit-wise results are:
- Un-initialized variables: if any local variables are used before they are initialized, it can lead to non-bit-wise results. Recompiling the code with ‘configure -D’ option may help detect uninitialized variables.
- Inappropriate HALO specification: such issues often lead to non-bit-wise results distributed along the boundary of decomposed domains. This may happen if the new code requires access to the neighboring grid point values.
- Something goes wrong in the newly introduced physics. This is scheme-dependent and needs in-depth understanding of the physics itself.
- For code using OpenMP mode, watch for race-conditions.