Throughput benchmark

Disclaimer

No benchmark is perfect. Every benchmark is targetting its specific need and must be well defined for comparison. The Coral-2 benchmark is very different from the throughput benchmark I describe here.

Throughput benchmark

The number of samples generated in a give time is the key for the throughput benchmark. Samples wasted in equilibration is not part of a measurement.

Figure of merit

FOM is defined as workload divided by elapsed time.

Weak scaling

QMCPACK spends little time in MPI communication. A full Machine FOM = 1 Node FOM x MPI efficiency. We have see MPI efficiency always above 95% in the past.

Problem size

QMCPACK workload depends on the problem size N, the number of electrons. B-spline SPO, Two and Three body Jastrow factors scale O(N^2) but Slater determinants scales O(N^3). For this reason, it is simpler to compare two FOM based on the same problem size. When the problem size is large, O(N^3) leads the cost and thus we can take O(N^3) for simplicity.

A simple formula

FOM = N^3 x Nwalker x DMC steps / wall clock time

For example running the 256 atom NiO problem on Titan, 14 walkers per GPU, 19.65 seconds per step DMC. Then the full machine FOM = 18000 nodes x 0.95 x 3072^3 x 14 / 19.65 = 3.53 x 10^14. This run used the CUDA code without delayed update algorithm.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throughput benchmark

Disclaimer

Throughput benchmark

Figure of merit

Weak scaling

Problem size

A simple formula

Clone this wiki locally