- Defines two large matrices (A and B) and fills them with random double-precision floating-point numbers.
- Performs matrix multiplication
C = A * B
. - Uses OpenMP to parallelize the computation, which will take advantage of multiple cores in Genoa/Milan CPUs.
- Repeats the multiplication several times and measures the average execution time.
- Calculates and reports the performance in GFLOPS (Giga Floating-Point Operations Per Second).
- Adjust the
MATRIX_SIZE
constant based on system's memory. Larger sizes will stress the memory subsystem more. - Make sure to compile with optimizations enabled (
-O3
flag). - If you want to test specific instruction sets, you can add flags like -march=znver3 for Zen 3 (Milan) or -march=znver4 for Zen 4 (Genoa).
g++ -O3 -fopenmp cpu-benchmark-openmp.cpp -o cpu_benchmark
mpic++ -O3 cpu-benchmark-mpi.cpp -o mpi_cpu_benchmark
- Added system information printing to show thread counts
- Created a separate
runBenchmark
function that can test different thread configurations - Added more detailed performance metrics (min/max times)
- Improved OpenMP scheduling with
schedule(dynamic)
- Added parallel initialization of matrices
- Automatically tests both physical cores only and all logical cores
To use this for SMT testing on EPYC :
-
With SMT enabled:
- The program will automatically detect and use all available threads
- It will run tests using both all cores and half the cores
-
With SMT disabled:
- It will automatically detect the reduced thread count
- The results will show performance with physical cores only