Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix prelu_backward TensorIterator split #36134

Closed
wants to merge 7 commits into from
Closed

Conversation

zasdfgbnm
Copy link
Collaborator

@zasdfgbnm zasdfgbnm commented Apr 7, 2020

We should have

    for (auto& sub_iter : iter.with_32bit_indexing()) {
      launch_prelu_cuda_backward_share_weights_kernel(sub_iter, weight_data);
    }

But I mistakenly wrote it as

    for (auto& sub_iter : iter.with_32bit_indexing()) {
      launch_prelu_cuda_backward_share_weights_kernel(iter, weight_data);
    }

in my previous PR. Which leads to infinite recursion on it.

I found this bug when working on #34004

I also add a TORCH_INTERNAL_ASSERT_DEBUG_ONLY to test for this.

Besides, the caller is already guaranteed contiguous, so we don't need to handle no-contiguous tensors.

Also add a `TORCH_INTERNAL_ASSERT_DEBUG_ONLY` to test for this.
@dr-ci
Copy link

dr-ci bot commented Apr 7, 2020

💊 Build failures summary and remediations

As of commit dcdbc28 (more details on the Dr. CI page):


None of the build failures appear to be your fault 💚



❄️ 1 tentatively flaky failure

1 failure tentatively classified as flaky but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (1/1)

Step: "Set Up CI Environment After attach_workspace" (full log | pattern match details | 🔁 rerun) ❄️

WARNING: infoROM is corrupted at gpu 0000:00:1D.0
|   1  Tesla M60           Off  | 00000000:00:1E.0 Off |                    0 | 
| N/A   28C    P0    37W / 150W |      0MiB /  7618MiB |     98%      Default | 
+-------------------------------+----------------------+----------------------+ 
                                                                                
+-----------------------------------------------------------------------------+ 
| Processes:                                                       GPU Memory | 
|  GPU       PID   Type   Process name                             Usage      | 
|=============================================================================| 
|  No running processes found                                                 | 
+-----------------------------------------------------------------------------+ 
WARNING: infoROM is corrupted at gpu 0000:00:1D.0 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 27 times.

@zasdfgbnm
Copy link
Collaborator Author

Wait, please hold on for now, seems the testing for prelu backward is missing some cases, I will add them later in this PR.

@ailzhang
Copy link
Contributor

ailzhang commented Apr 7, 2020

@zasdfgbnm If you rebase on top of pytorch/master the XLA failure should be gone as well. Thanks!

@gchanan gchanan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 8, 2020
@zasdfgbnm zasdfgbnm changed the title Fix prelu_backward TensorIterator split [WIP] Fix prelu_backward TensorIterator split Apr 14, 2020
@zasdfgbnm zasdfgbnm changed the title [WIP] Fix prelu_backward TensorIterator split Fix prelu_backward TensorIterator split Apr 22, 2020
OffsetCalculator<2>(iter.ndim(), iter.shape().data(), out_strides.data())
);
}
TORCH_INTERNAL_ASSERT(iter.is_contiguous());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caller of launch_prelu_cuda_backward_share_weights_kernel already guarantees contiguous.

@zasdfgbnm
Copy link
Collaborator Author

@VitalyFedyunin This PR is now ready. This is a bug fix for my previous prelu CUDA_tensor_apply4 PR.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@zasdfgbnm zasdfgbnm deleted the zasdfgbnm-patch-6 branch April 23, 2020 17:46
@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in 438aed6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants