Skip to content

zkh2016/sgemm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

sgemm

The implementation method refer to the maxas.

performance

  1. The test environment: ubuntu18.04, cuda10, 1080ti
  2. The code only supports limited input matrix, not universal adaptation, only for learning. Here is the GFLOP for testing different size matrices
N cublas sgemm sgemm/cublas
512 4451.6069 3587.3280 80%
1024 7856.5241 6640.6945 84%
2048 9409.4447 8769.9500 93%
4096 10180.4288 9708.4873 95%

About

Cuda-based matrix multiplication, compared with cuBLas performance. Refer to the https://github.com/NervanaSystems/maxas/wiki/SGEMM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published