Skip to content

Commit

Permalink
Fix TF32 convergence issue with TF32 (pytorch#1244)
Browse files Browse the repository at this point in the history
* Fix TF32 convergence issue with TF32

* save
  • Loading branch information
zasdfgbnm authored Nov 17, 2020
1 parent e6167a9 commit 9ecc44a
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 2 deletions.
10 changes: 9 additions & 1 deletion beginner_source/examples_autograd/two_layer_net_autograd.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,15 @@

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# torch.backends.cuda.matmul.allow_tf32 = False # Uncomment this to run on GPU

# The above line disables TensorFloat32. This a feature that allows
# networks to run at a much faster speed while sacrificing precision.
# Although TensorFloat32 works well on most real models, for our toy model
# in this tutorial, the sacrificed precision causes convergence issue.
# For more information, see:
# https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,15 @@ def backward(ctx, grad_output):

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# torch.backends.cuda.matmul.allow_tf32 = False # Uncomment this to run on GPU

# The above line disables TensorFloat32. This a feature that allows
# networks to run at a much faster speed while sacrificing precision.
# Although TensorFloat32 works well on most real models, for our toy model
# in this tutorial, the sacrificed precision causes convergence issue.
# For more information, see:
# https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
Expand Down

0 comments on commit 9ecc44a

Please sign in to comment.