You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Are there plans to support matmuls in FP16 with APEX? It seems that you can make low level CUDA calls that do FP16 GEMMs and accumulate in FP32. However, this feature is not exposed in PyTorch. It would be great if we could have mixed precision matmuls through apex!
The text was updated successfully, but these errors were encountered:
The "FP16 input, internal FP32 accumulate, FP16 output" behavior is a hardware feature of Tensor Cores, far below anything that Apex controls. It is already "exposed" in Pytorch in the sense that any matrix multiplication you call with torch.cuda.HalfTensors as inputs will use Tensor Cores if your GPU has them (ie, if it's Volta or Turing).
Apex is not doing anything weird to ensure that Tensor Cores are invoked. All it does is cast the inputs of torch.mms, etc, to half on the Python side to ensure that the Tensor Core hardware path is eventually taken. You can call .half() on inputs to torch.mm (and other GEMM calls) manually (if you want) and achieve the same result.
In addition to inputs being FP16, there are some other (fairly easy to satisfy) constraints on tensor dimensions that must be satisfied to enable Tensor Core use: #221 (comment)
Hi,
Are there plans to support matmuls in FP16 with APEX? It seems that you can make low level CUDA calls that do FP16 GEMMs and accumulate in FP32. However, this feature is not exposed in PyTorch. It would be great if we could have mixed precision matmuls through apex!
The text was updated successfully, but these errors were encountered: