Skip to content

Production release of COSMA

Compare
Choose a tag to compare
@kabicm kabicm released this 08 Mar 23:05
· 269 commits to master since this release
5b63093

This is the first production release of COSMA. It brings a lot of bug-fixes and performance improvements. Some of the most important updates are the following:

  • Faster GPU backend:
    • pinning/unpinning of host memory amortized
    • better stream synchronization
    • tiling mechanism improved
  • Faster memory access: using huge pages (2M)
  • Highly-optimized pxgemm (scalapack) wrapper:
    • layout transformation optimized, using maximum-weighted perfect matching
    • COSMA can use the initial layout directly, if the layout transformation is too expensive in some cases.
  • Portability:
    • Hybrid version: ported to both NVIDIA and AMD GPUs.
    • CPU-only version: supports MKL, OpenBLAS, Cray-libsci and custom gemm backends.
  • Usability:
    • Trivial integration: to use our code, it is enough to link to the library, without changing the user-code.
    • Spack-installable
  • Bug-fixes:
    • correctness tested on up to 1024 nodes of Piz Daint Supercomputer (Cray XC50).