-
Notifications
You must be signed in to change notification settings - Fork 21
Performances
The benchmark cases are meant to be run on 1, 4, and 32 MPI tasks.
Here is the time they take to run on different systems and with different Kokkos backends.
CPU: Intel(R) Xeon(R) CPU E5-4640 v4 @ 2.10GHz, 4 sockets, 12 cores per sockets, 2 threads per core.
1 MPI task (Serial):
benchmark_NE_1: 4.95291s
benchmark_NE_2: 1.63282s
benchmark_NE_3: 62.3336s
benchmark_NE_5: 3.1134s
benchmark_PSI_1: 439.509s
benchmark_PSI_2: 944.674s
benchmark_PSI_8: 10.5877s
benchmark_PSI_9: 16.2795s
benchmark_PSI_10: 159.21s
4 MPI tasks (Serial):
benchmark_NE_3: 19.1416s
benchmark_PSI_1: 283.659s
benchmark_PSI_7: 233.093s
benchmark_PSI_8: 3.4179s
benchmark_PSI_9: 5.23894s
benchmark_PSI_10: 48.3755s
32 MPI tasks (Serial):
benchmark_NE_3: 4.28244s
benchmark_NE_4: 136.635s
benchmark_PSI_1: 43.3344s
benchmark_PSI_3: 176.356s
benchmark_PSI_4: 14.5344s
benchmark_PSI_5: 1460.94s
benchmark_PSI_6: 922.155s
1 MPI task (OMP_NUM_THREADS=48
):
benchmark_NE_1: 4.74335s
benchmark_NE_2: 1.62829s
benchmark_NE_3: 119.906s
benchmark_NE_5: 3.84555s
benchmark_PSI_1: 329.444s
benchmark_PSI_2: 704.83s
benchmark_PSI_8: 14.2629s
benchmark_PSI_9: 20.8359s
benchmark_PSI_10: 270.26s
4 MPI tasks (OMP_NUM_THREADS=24
):
benchmark_NE_3: 20.0956s
benchmark_PSI_1: 168.699s
benchmark_PSI_7: 116.909s
benchmark_PSI_8: 3.10345s
benchmark_PSI_9: 4.82322s
benchmark_PSI_10: 33.6607s
32 MPI tasks (OMP_NUM_THREADS=2
):
benchmark_NE_3: 5.64602s
benchmark_NE_4: 154.724s
benchmark_PSI_1: 35.7453s
benchmark_PSI_3: 174.279s
benchmark_PSI_4: 13.1064s
benchmark_PSI_5: 1199.72s
benchmark_PSI_6: 879.221s
GPU: NVIDIA TITAN V (5120 CUDA cores, 640 tensor cores, 1455 MHz, 12 GB HBM2), CUDA Version: 11.5, Kokkos_ARCH_VOLTA70.
1 MPI task (CUDA):
benchmark_NE_1: 2.90882s
benchmark_NE_2: 1.08391s
benchmark_NE_3: 137.718s
benchmark_NE_5: 2.40302s
benchmark_PSI_1: 293.999s
benchmark_PSI_2: 410.681s
benchmark_PSI_8: 16.2475s
benchmark_PSI_9: 23.9108s
benchmark_PSI_10: 132.685s
4 MPI tasks (CUDA):
benchmark_NE_3: 285.823s
benchmark_PSI_1: 171.391s
benchmark_PSI_7: 110.069s
benchmark_PSI_8: 20.2436s
benchmark_PSI_9: 29.4551s
benchmark_PSI_10: 298.828s
CPU: 2 IBM POWER9. GPU: 6 NVIDIA Tesla V100, CUDA Version: 11.0.3, Kokkos_ARCH_VOLTA70.
1 MPI task (CUDA):
benchmark_NE_1: 7.29145s
benchmark_NE_2: 1.87785s
benchmark_NE_3: 276.962s
benchmark_NE_5: 4.89861s
benchmark_PSI_1: 1202.79s
benchmark_PSI_2: 1327.91s
benchmark_PSI_8: 40.311s
benchmark_PSI_9: 64.3427s
benchmark_PSI_10: 198.552s
4 MPI tasks (CUDA):
benchmark_NE_3: 62.8108s
benchmark_PSI_1: 411.963s
benchmark_PSI_7: 261.677s
benchmark_PSI_8: 9.50936s
benchmark_PSI_9: 15.2026s
benchmark_PSI_10: 48.4837s
32 MPI tasks (CUDA):
benchmark_NE_3: 12.9853s
benchmark_NE_4: 63.6345s
benchmark_PSI_1: 57.1283s
benchmark_PSI_3: 438.623s
benchmark_PSI_4: 12.4112s
benchmark_PSI_5: 531.49s