Skip to content

Simple project that shows the effect of different matrix multiplication speed-ups for CUDA

Notifications You must be signed in to change notification settings

MarcBrede/cuda_matrix_mult

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

cuda_matrix_mult

Simple project that shows the effect of different matrix multiplication speed-ups for CUDA

Results

This code executed on a RTX3080 yields the following times:

  • CPU Time: 4324.901000 ms. Traditional matrix multiplication using the CPU without any parallelization.
  • CUDA Time: 1.128256 ms. Utilizes CUDA for parallelization of matrix multiplication on the GPU.
  • CUDA Time with Shared Memory: 0.871296 ms. Further optimizes CUDA by utilizing shared memory, reducing access times.
  • CUDA Time with Shared Memory and Prefetching to GPU: 0.866304 ms. Includes both shared memory optimization and prefetching data to the GPU, allowing for concurrent computation and data transfer.

speed-ups

About

Simple project that shows the effect of different matrix multiplication speed-ups for CUDA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages