Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Re-land] [CUDA graphs] Clear autocast amp cache #81896

Conversation

Aidyn-A
Copy link
Collaborator

@Aidyn-A Aidyn-A commented Jul 21, 2022

Re-lands #81558 that got reverted due failing tests.

This failure happened because of the test that I poorly designed. The loop here is doing cache_enabled=False and then cache_enabled=True. By doing this loop the graph from previous iteration (case False) conflicts with the next one (case True). I redesigned the test such that it does not do any loops. The new test does separate function calls with different argument values.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 21, 2022

🔗 Helpful links

✅ No Failures (0 Pending)

As of commit 26d7e13 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 22, 2022
@ngimel
Copy link
Collaborator

ngimel commented Aug 2, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge and created land time checks. See merge status here and land check progress here.

pytorchmergebot pushed a commit that referenced this pull request Aug 2, 2022
Re-lands #81558 that got reverted due failing tests.

This failure happened because of the test that I poorly designed. [The loop here](https://github.com/pytorch/pytorch/pull/81558/files#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3837) is doing `cache_enabled=False` and then `cache_enabled=True`. By doing this loop the graph from previous iteration (case `False`) conflicts with the next one (case `True`). I redesigned the test such that it does not do any loops. The new test does separate function calls with different argument values.
Pull Request resolved: #81896
Approved by: https://github.com/ngimel
@pytorchmergebot
Copy link
Collaborator

Merge failed due to Failed to merge; some land checks failed: pull, pull / win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)
Raised by https://github.com/pytorch/pytorch/actions/runs/2784337577 If you believe this is an error, you can use the old behavior with @pytorchbot merge -g (optionally with the "ciflow/trunk" to get land signals) or use @pytorchbot merge -f "some reason here". For more information, see the bot wiki.

@Aidyn-A
Copy link
Collaborator Author

Aidyn-A commented Aug 2, 2022

@ngimel, looks like tests failures are unrelated to AMP and CUDA graphs. Should we try to force merge it?

@ngimel
Copy link
Collaborator

ngimel commented Aug 2, 2022

@pytorchbot merge -f "test failures are unrelated"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here

@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2022

Hey @Aidyn-A.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot pushed a commit that referenced this pull request Aug 4, 2022
Summary:
Re-lands #81558 that got reverted due failing tests.

This failure happened because of the test that I poorly designed. [The loop here](https://github.com/pytorch/pytorch/pull/81558/files#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3837) is doing `cache_enabled=False` and then `cache_enabled=True`. By doing this loop the graph from previous iteration (case `False`) conflicts with the next one (case `True`). I redesigned the test such that it does not do any loops. The new test does separate function calls with different argument values.

Pull Request resolved: #81896
Approved by: https://github.com/ngimel

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/da0a3fe058de386d569b9fd621bd845d40e0cc39

Reviewed By: kit1980

Differential Revision: D38394874

fbshipit-source-id: e8aeecaa4cff30379b20d852cbf00460983a8615
jithunnair-amd pushed a commit to ROCm/pytorch that referenced this pull request Nov 24, 2022
…es to pass

[Re-land] [CUDA graphs] Clear autocast amp cache (pytorch#81896)

Re-lands pytorch#81558 that got reverted due failing tests.

This failure happened because of the test that I poorly designed. [The loop here](https://github.com/pytorch/pytorch/pull/81558/files#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3837) is doing `cache_enabled=False` and then `cache_enabled=True`. By doing this loop the graph from previous iteration (case `False`) conflicts with the next one (case `True`). I redesigned the test such that it does not do any loops. The new test does separate function calls with different argument values.
Pull Request resolved: pytorch#81896
Approved by: https://github.com/ngimel
jithunnair-amd added a commit to ROCm/pytorch that referenced this pull request Nov 28, 2022
…es to pass (#1144)

[Re-land] [CUDA graphs] Clear autocast amp cache (pytorch#81896)

Re-lands pytorch#81558 that got reverted due failing tests.

This failure happened because of the test that I poorly designed. [The loop here](https://github.com/pytorch/pytorch/pull/81558/files#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3837) is doing `cache_enabled=False` and then `cache_enabled=True`. By doing this loop the graph from previous iteration (case `False`) conflicts with the next one (case `True`). I redesigned the test such that it does not do any loops. The new test does separate function calls with different argument values.
Pull Request resolved: pytorch#81896
Approved by: https://github.com/ngimel

Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants