SGEMM and DGEMM kernel functions on Nvidia GPUs.
Efficiency of the SGEMM kernel: 30-40% on GTX Titan Black, 60% on Tesla P4 and Tesla P100, 80% on Tesla V100.
Efficiency of the DGEMM kernel: 40% on GTX Titan Black, 70-80% on Tesla P100 and Tesla V100.