Different NVIDIA CUDA and AMD HIP implementations of matrix multiplication, vector add, reduce operations, and layernorm kernels. Each kernel also uses different data types like fp64, fp32, fp16(half), and half2.
-
Notifications
You must be signed in to change notification settings - Fork 0
scxiao/hip_cuda_examples
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Different NVIDIA CUDA and AMD HIP implementations of matrix multiplication
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published