HPX_Benchmarks

Here we try out several different benchmarks, that run on a university cluster (Heidelberg) to evaluate the performance of the HPX runtime.

Our final results are placed in the HPX_Programs folder.

Improvements after the Discussion on October 19, 2023

The following improvements should be implemented for the respective programs:

For "Transform", "Reduction" and "Scan":

Adjusted performance measurements with cache warm-up (not measuring the entire program runtime):

Has been implemented.
Execution policy “par_simd”:

Not provided with the current HPX installation. The documentation states that a new datapar backend SVE must be installed to execute par_simd.

See the documentation:

https://hpx-docs.stellar-group.org/branches/master/html/releases/whats_new_1_9_0.html?highlight=simd

https://github.com/STEllAR-GROUP/hpx/blob/master/cmake/HPX_SetupDatapar.cmake

Additionally, for "Reduction" and "Scan":

Determine how the data of a partitioned_vector is divided (large blocks or many small ones):

The documentation states, that it is a "Dynamic segmented contiguous array".

See documentation:

https://hpx-docs.stellar-group.org/branches/master/html/manual/writing_distributed_hpx_applications.html

Additionally, for "Scan":

Execute the second inclusive_scan() with the respective starting value from sums_per_locality vector (eliminating the need of the last Transform in the old Scan implementation):

This has been implemented and resulted in a significant performance improvement with multiple nodes.
Use the sums_per_locality vector as a shared vector instead of a partitioned_vector, so that all localities have access to the entire vector:

We did not find explicit information about this in the documentation. Therefore, we conducted various experiments based on our own considerations, but without success.

The following adjustment, in addition to the requested ones, was made:

We had to use hpx::distributed::barrier instead of hpx::distributed::latch for the “Reduction” and “Scan” due to the revised performance measurement process (execution in a loop).

General Information:

Performance measurements are performed on the qdr-partition.

We did not repeat the performance tests on the rome-partition, because there are currently problems on this partition, which lead to program crashes.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.vscode		.vscode
Daniel		Daniel
HPX_Programs		HPX_Programs
nikita		nikita
numa_v1		numa_v1
plots		plots
.gitignore		.gitignore
README.md		README.md
load-env.sh		load-env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HPX_Benchmarks

Improvements after the Discussion on October 19, 2023

For "Transform", "Reduction" and "Scan":

Additionally, for "Reduction" and "Scan":

Additionally, for "Scan":

The following adjustment, in addition to the requested ones, was made:

General Information:

About

Releases

Packages

Contributors 2

Languages

MrWhatZitToYaa/HPX_Benchmarks

Folders and files

Latest commit

History

Repository files navigation

HPX_Benchmarks

Improvements after the Discussion on October 19, 2023

For "Transform", "Reduction" and "Scan":

Additionally, for "Reduction" and "Scan":

Additionally, for "Scan":

The following adjustment, in addition to the requested ones, was made:

General Information:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages