Tensor Core enabled for mixed precision on GPUs that allow it #206

allaffa · 2023-12-13T02:26:07Z

No description provided.

jychoi-hpc · 2023-12-15T18:53:45Z

I ran a few test runs using qm9 on Frontier. I found the following related to this AMP feature (Automatic Mixed Precision):

(Almost) no speed up in time: it may differ by cases but at least in the qm9 case the wallclock time reduction was very very small.
But, memory saving was obvious: 114 MB with AMP vs 180 MB without AMP
Numeric artifacts can occur with AMP. I.e., we don't get same numbers with AMP and without AMP. Here are the loss plots:

jychoi-hpc · 2023-12-18T20:45:36Z

We may need to make this feature optional as it changes the accuracy or losses.

allaffa · 2023-12-18T20:47:13Z

We may need to make this feature optional as it changes the accuracy or losses.

@jychoi-hpc I agree. The differences between full precision and mixed-precision seem to be drastic, even for a relatively simple dataset like QM9

Tensor Core enabled for mixed precision on GPUs that allow it

75df65e

allaffa added the enhancement New feature or request label Dec 13, 2023

allaffa requested a review from jychoi-hpc December 13, 2023 02:26

allaffa self-assigned this Dec 13, 2023

pzhanggit added 2 commits March 7, 2024 14:40

bf16 from Jonghyun

f0f38a2

format

c343fa7

allaffa mentioned this pull request Apr 9, 2024

Use 16-bit Floats #26

Closed

Provide feedback