-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add tf32 doc #6770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add tf32 doc #6770
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
1516ff6
add tf32 doc
qingpeng9802 7ada18b
Merge branch 'Project-MONAI:dev' into add-tf32-doc
qingpeng9802 2b595f5
add a doc link
qingpeng9802 b0fdccf
Merge branch 'add-tf32-doc' of https://github.com/qingpeng9802/MONAI …
qingpeng9802 705d0f0
fix img
qingpeng9802 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| # Precision and Performance | ||
|
|
||
| Modern GPU architectures usually can use reduced precision tensor data or computational operations to save memory and increase throughput. However, in some cases, the reduced precision will cause numerical stability issues, and further cause reproducibility issues. Therefore, please ensure that you are using appropriate precision. | ||
|
|
||
| <!-- Maybe adding Automatic Mixed Precision, Float16 or BFloat16 in the future--> | ||
|
|
||
| ## TensorFloat-32 (TF32) | ||
|
|
||
| ### Introduction | ||
|
|
||
| NVIDIA introduced a new math mode TensorFloat-32 (TF32) for NVIDIA Ampere GPUs and above, see [Accelerating AI Training with NVIDIA TF32 Tensor Cores](https://developer.nvidia.com/blog/accelerating-ai-training-with-tf32-tensor-cores/), [TRAINING NEURAL NETWORKS | ||
| WITH TENSOR CORES](https://nvlabs.github.io/eccv2020-mixed-precision-tutorial/files/dusan_stosic-training-neural-networks-with-tensor-cores.pdf), [CUDA 11](https://developer.nvidia.com/blog/cuda-11-features-revealed/) and [Ampere architecture](https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/). | ||
|
|
||
| TF32 adopts 8 exponent bits, 10 bits of mantissa, and one sign bit. | ||
|
|
||
|  | ||
|
|
||
| ### Potential Impact | ||
|
|
||
| Although NVIDIA has shown that TF32 mode can reach the same accuracy and convergence as float32 for most AI workloads, some users still find some significant effect on their applications, see [PyTorch and TensorFloat32](https://dev-discuss.pytorch.org/t/pytorch-and-tensorfloat32/504). Users who need high-precision matrix operation, such as traditional computer graphics operation and kernel method, may be affected by TF32 precision. | ||
|
|
||
| Note that all operations that use `cuda.matmul` may be affected | ||
| by TF32 mode so the impact is very wide. | ||
|
|
||
| ### Settings | ||
|
|
||
| [PyTorch TF32](https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices) default value: | ||
| ```python | ||
| torch.backends.cuda.matmul.allow_tf32 = False # in PyTorch 1.12 and later. | ||
| torch.backends.cudnn.allow_tf32 = True | ||
| ``` | ||
| Please note that there are environment variables that can override the flags above. For example, the environment variables mentioned in [Accelerating AI Training with NVIDIA TF32 Tensor Cores](https://developer.nvidia.com/blog/accelerating-ai-training-with-tf32-tensor-cores/) and `TORCH_ALLOW_TF32_CUBLAS_OVERRIDE` used by PyTorch. Thus, in some cases, the flags may be accidentally changed or overridden. | ||
|
|
||
| We recommend that users print out these two flags for confirmation when unsure. | ||
|
|
||
| If you are using an [NGC PyTorch container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch), the container includes a layer `ENV TORCH_ALLOW_TF32_CUBLAS_OVERRIDE=1`. | ||
| The default value `torch.backends.cuda.matmul.allow_tf32` will be overridden to `True`. | ||
|
|
||
| If you can confirm through experiments that your model has no accuracy or convergence issues in TF32 mode and you have NVIDIA Ampere GPUs or above, you can set the two flags above to `True` to speed up your model. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.