FP16 matmuls with apex #562

KeAWang · 2019-10-22T17:56:20Z

Hi,

Are there plans to support matmuls in FP16 with APEX? It seems that you can make low level CUDA calls that do FP16 GEMMs and accumulate in FP32. However, this feature is not exposed in PyTorch. It would be great if we could have mixed precision matmuls through apex!

mcarilli · 2019-10-22T18:26:58Z

The "FP16 input, internal FP32 accumulate, FP16 output" behavior is a hardware feature of Tensor Cores, far below anything that Apex controls. It is already "exposed" in Pytorch in the sense that any matrix multiplication you call with torch.cuda.HalfTensors as inputs will use Tensor Cores if your GPU has them (ie, if it's Volta or Turing).

Apex is not doing anything weird to ensure that Tensor Cores are invoked. All it does is cast the inputs of torch.mms, etc, to half on the Python side to ensure that the Tensor Core hardware path is eventually taken. You can call .half() on inputs to torch.mm (and other GEMM calls) manually (if you want) and achieve the same result.

In addition to inputs being FP16, there are some other (fairly easy to satisfy) constraints on tensor dimensions that must be satisfied to enable Tensor Core use: #221 (comment)

mcarilli closed this as completed Oct 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP16 matmuls with apex #562

FP16 matmuls with apex #562

KeAWang commented Oct 22, 2019

mcarilli commented Oct 22, 2019 •

edited

Loading

FP16 matmuls with apex #562

FP16 matmuls with apex #562

Comments

KeAWang commented Oct 22, 2019

mcarilli commented Oct 22, 2019 • edited Loading

mcarilli commented Oct 22, 2019 •

edited

Loading