Skip to content

Commit a255dcb

Browse files
author
Sean Naren
authored
Update bfloat16 docs (#10330)
1 parent 412f0a4 commit a255dcb

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

docs/source/advanced/mixed_precision.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ FP16 Mixed Precision
2323

2424
In most cases, mixed precision uses FP16. Supported torch operations are automatically run in FP16, saving memory and improving throughput on GPU and TPU accelerators.
2525

26-
Since computation happens in FP16, there is a chance of numerical instability. This is handled internally by a dynamic grad scaler which skips steps that are invalid, and adjusts the scaler to ensure subsequent steps fall within a finite range. For more information `see the autocast docs <https://pytorch.org/docs/stable/amp.html#gradient-scaling>`__.
26+
Since computation happens in FP16, there is a chance of numerical instability during training. This is handled internally by a dynamic grad scaler which skips steps that are invalid, and adjusts the scaler to ensure subsequent steps fall within a finite range. For more information `see the autocast docs <https://pytorch.org/docs/stable/amp.html#gradient-scaling>`__.
2727

2828
.. note::
2929

@@ -39,15 +39,15 @@ BFloat16 Mixed Precision
3939

4040
.. warning::
4141

42-
BFloat16 requires PyTorch 1.10 or later. Currently this requires installing `PyTorch Nightly <https://pytorch.org/get-started/locally/>`__.
42+
BFloat16 requires PyTorch 1.10 or later.
4343

4444
BFloat16 is also experimental and may not provide large speedups or memory improvements, but offer better numerical stability.
4545

4646
Do note for GPUs, largest benefits require `Ampere <https://en.wikipedia.org/wiki/Ampere_(microarchitecture)>`__ based GPUs, such as A100s or 3090s.
4747

4848
BFloat16 Mixed precision is similar to FP16 mixed precision, however we maintain more of the "dynamic range" that FP32 has to offer. This means we are able to improve numerical stability, compared to FP16 mixed precision. For more information see `this TPU performance blog post <https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus>`__.
4949

50-
Since BFloat16 is more stable than FP16 during training, we do not need to worry about any gradient scaling or nan gradient values that comes with using FP16 mixed precision.
50+
Under the hood we use `torch.autocast <https://pytorch.org/docs/stable/amp.html>`__ with the the dtype set to `bfloat16`, with no gradient scaling.
5151

5252
.. testcode::
5353
:skipif: not _TORCH_GREATER_EQUAL_1_10 or not torch.cuda.is_available()

0 commit comments

Comments
 (0)