Skip to content

Latest commit

 

History

History
48 lines (32 loc) · 1.72 KB

README.md

File metadata and controls

48 lines (32 loc) · 1.72 KB

cpu-benchmark

  • Defines two large matrices (A and B) and fills them with random double-precision floating-point numbers.
  • Performs matrix multiplication C = A * B.
  • Uses OpenMP to parallelize the computation, which will take advantage of multiple cores in Genoa/Milan CPUs.
  • Repeats the multiplication several times and measures the average execution time.
  • Calculates and reports the performance in GFLOPS (Giga Floating-Point Operations Per Second).

Compilation

  • Adjust the MATRIX_SIZE constant based on system's memory. Larger sizes will stress the memory subsystem more.
  • Make sure to compile with optimizations enabled (-O3 flag).
  • If you want to test specific instruction sets, you can add flags like -march=znver3 for Zen 3 (Milan) or -march=znver4 for Zen 4 (Genoa).

Compiling the openmp version

g++ -O3 -fopenmp cpu-benchmark-openmp.cpp -o cpu_benchmark

Compiling the mpi version

mpic++ -O3 cpu-benchmark-mpi.cpp -o mpi_cpu_benchmark

smt-benchmark-openmp

  1. Added system information printing to show thread counts
  2. Created a separate runBenchmark function that can test different thread configurations
  3. Added more detailed performance metrics (min/max times)
  4. Improved OpenMP scheduling with schedule(dynamic)
  5. Added parallel initialization of matrices
  6. Automatically tests both physical cores only and all logical cores

To use this for SMT testing on EPYC :

  • With SMT enabled:

    • The program will automatically detect and use all available threads
    • It will run tests using both all cores and half the cores
  • With SMT disabled:

    • It will automatically detect the reduced thread count
    • The results will show performance with physical cores only