Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropout with prob == 0 doesn't validate consistently #1799

Closed
csarofeen opened this issue Jul 3, 2022 · 3 comments · Fixed by #1804
Closed

Dropout with prob == 0 doesn't validate consistently #1799

csarofeen opened this issue Jul 3, 2022 · 3 comments · Fixed by #1804
Assignees

Comments

@csarofeen
Copy link
Owner

🐛 Describe the bug

The following script doesn't validate consistently on TOT. It seems we may still be dropping out some values even though probability == 0. I think this may be because of: https://github.com/csarofeen/pytorch/blob/devel/torch/csrc/jit/codegen/cuda/ops/composite.cpp#L31 which maybe should be le not lt?

import functools
import random
from typing import List

import torch
import torch.nn.functional as F

def composite_definition(
    input1: torch.Tensor,
    input2: torch.Tensor,
    weight: torch.Tensor,
    bias1: torch.Tensor,
    bias2: torch.Tensor,
    normalization_axis: int,
    dropout_prob: float,
) -> torch.Tensor:
    bias1_out = input1 + bias1
    dropout_out = F.dropout(bias1_out, 0.0, True)
    norm_input = dropout_out + input2
    norm_output = F.layer_norm(norm_input, (input1.size(normalization_axis),), weight, bias2)
    return norm_output

# Setup initial tensors and parameters
input_size = [64, 128, 1024]
device = "cuda"
dtype = torch.float32

# Create sample inputs
input1 = torch.randn(*input_size, device=device, dtype=dtype, requires_grad=True)
input2 = torch.rand_like(input1).requires_grad_()
 
# Precompute a grad output tensor, for this example it's the same size as the inputs
grad_output = torch.rand_like(input1)
 
# Randomly initialize the model parameters
weight = torch.nn.Parameter(torch.randn(input_size[2], dtype=dtype, device=device))
bias1 = torch.nn.Parameter(torch.randn(input_size[2], dtype=dtype, device=device))
bias2 = torch.nn.Parameter(torch.randn(input_size[2], dtype=dtype, device=device))

parameters = [input1, input2, weight, bias1, bias2]
ref_composite = composite_definition(input1, input2, weight, bias1, bias2, normalization_axis=2, dropout_prob=0.0)

scripted_composite_definition = torch.jit.script(composite_definition)

for i in range(20):
  scripted = scripted_composite_definition(input1, input2, weight, bias1, bias2, normalization_axis=2, dropout_prob=0.0)
  print("output abs max {}".format((ref_composite - scripted).abs().max()))

Versions

TOT

@IvanYashchuk
Copy link
Collaborator

IvanYashchuk commented Jul 4, 2022

Using le instead of lt seems to fix the problem, but I don't think it's correct. It's just an indicator that nvfuser's randlike function produces 1.0 while it shouldn't if it's supposed to correspond to torch.rand_like that samples from a uniform distribution on the interval [0, 1) - exclusive of 1.0.
curand_uniform (is it what's used in nvfuser?) reverses the interval bounds - it excludes 0.0 and includes 1.0.

torch.native_dropout has the same problem. There's an interesting conditional: torch.native_dropout is used for F.Dropout only if p > 0 && p < 1
https://github.com/pytorch/pytorch/blob/76cff182428fbd165b5725f3de29dbd91a1512fa/aten/src/ATen/native/Dropout.cpp#L28-L30
torch.native_dropout has the same behavior because < p is used in the cuda implementation with curand_uniform:
https://github.com/pytorch/pytorch/blob/76cff182428fbd165b5725f3de29dbd91a1512fa/aten/src/ATen/native/cuda/Dropout.cu#L96-L100

@csarofeen
Copy link
Owner Author

O.o any suggestion as to what we should do?

@jjsjann123
Copy link
Collaborator

Tried to flip drop prob to 1.0 and looks like the issue with rand_like is real. We are producing [0.0, 1.0]. So that's a separate thing that we should look at. I'll open an issue for that.

le doesn't sound right, we can hook the logic inside dropout with bitwise to create a short-cut mask for p==0 & p==1.

jjsjann123 added a commit that referenced this issue Jul 7, 2022
Fixes #1799

1. Updates rand_like by changing output==1 to 0 via `where`;
2. Patches codegen float output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants