You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Different NVIDIA CUDA and AMD HIP implementations of matrix multiplication, vector add, reduce operations, and layernorm kernels.
Each kernel also uses different data types like fp64, fp32, fp16(half), and half2.