Here we try out several different benchmarks, that run on a university cluster (Heidelberg) to evaluate the performance of the HPX runtime.
Our final results are placed in the HPX_Programs folder.
The following improvements should be implemented for the respective programs:
-
Adjusted performance measurements with cache warm-up (not measuring the entire program runtime):
Has been implemented. -
Execution policy “par_simd”:
Not provided with the current HPX installation. The documentation states that a new datapar backend SVE must be installed to execute par_simd.
See the documentation:
https://hpx-docs.stellar-group.org/branches/master/html/releases/whats_new_1_9_0.html?highlight=simd
https://github.com/STEllAR-GROUP/hpx/blob/master/cmake/HPX_SetupDatapar.cmake
- Determine how the data of a partitioned_vector is divided (large blocks or many small ones):
The documentation states, that it is a "Dynamic segmented contiguous array".
See documentation:
https://hpx-docs.stellar-group.org/branches/master/html/manual/writing_distributed_hpx_applications.html
-
Execute the second inclusive_scan() with the respective starting value from sums_per_locality vector (eliminating the need of the last Transform in the old Scan implementation):
This has been implemented and resulted in a significant performance improvement with multiple nodes. -
Use the sums_per_locality vector as a shared vector instead of a partitioned_vector, so that all localities have access to the entire vector:
We did not find explicit information about this in the documentation. Therefore, we conducted various experiments based on our own considerations, but without success.
- We had to use hpx::distributed::barrier instead of hpx::distributed::latch for the “Reduction” and “Scan” due to the revised performance measurement process (execution in a loop).
Performance measurements are performed on the qdr-partition.
We did not repeat the performance tests on the rome-partition, because there are currently problems on this partition, which lead to program crashes.