[Re-land] [CUDA graphs] Clear autocast amp cache #81896

Aidyn-A · 2022-07-21T16:49:22Z

Re-lands #81558 that got reverted due failing tests.

This failure happened because of the test that I poorly designed. The loop here is doing cache_enabled=False and then cache_enabled=True. By doing this loop the graph from previous iteration (case False) conflicts with the next one (case True). I redesigned the test such that it does not do any loops. The new test does separate function calls with different argument values.

facebook-github-bot · 2022-07-21T16:49:28Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81896
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit 26d7e13 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…with_amp

ngimel · 2022-08-02T18:25:57Z

@pytorchbot merge

pytorchmergebot · 2022-08-02T18:27:22Z

@pytorchbot successfully started a merge and created land time checks. See merge status here and land check progress here.

Re-lands #81558 that got reverted due failing tests. This failure happened because of the test that I poorly designed. [The loop here](https://github.com/pytorch/pytorch/pull/81558/files#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3837) is doing `cache_enabled=False` and then `cache_enabled=True`. By doing this loop the graph from previous iteration (case `False`) conflicts with the next one (case `True`). I redesigned the test such that it does not do any loops. The new test does separate function calls with different argument values. Pull Request resolved: #81896 Approved by: https://github.com/ngimel

pytorchmergebot · 2022-08-02T19:17:53Z

Merge failed due to Failed to merge; some land checks failed: pull, pull / win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge)
Raised by https://github.com/pytorch/pytorch/actions/runs/2784337577 If you believe this is an error, you can use the old behavior with @pytorchbot merge -g (optionally with the "ciflow/trunk" to get land signals) or use @pytorchbot merge -f "some reason here". For more information, see the bot wiki.

Aidyn-A · 2022-08-02T22:01:31Z

@ngimel, looks like tests failures are unrelated to AMP and CUDA graphs. Should we try to force merge it?

ngimel · 2022-08-02T23:19:10Z

@pytorchbot merge -f "test failures are unrelated"

pytorchmergebot · 2022-08-02T23:21:55Z

@pytorchbot successfully started a merge job. Check the current status here

github-actions · 2022-08-02T23:22:34Z

Hey @Aidyn-A.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Summary: Re-lands #81558 that got reverted due failing tests. This failure happened because of the test that I poorly designed. [The loop here](https://github.com/pytorch/pytorch/pull/81558/files#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3837) is doing `cache_enabled=False` and then `cache_enabled=True`. By doing this loop the graph from previous iteration (case `False`) conflicts with the next one (case `True`). I redesigned the test such that it does not do any loops. The new test does separate function calls with different argument values. Pull Request resolved: #81896 Approved by: https://github.com/ngimel Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/da0a3fe058de386d569b9fd621bd845d40e0cc39 Reviewed By: kit1980 Differential Revision: D38394874 fbshipit-source-id: e8aeecaa4cff30379b20d852cbf00460983a8615

…es to pass [Re-land] [CUDA graphs] Clear autocast amp cache (pytorch#81896) Re-lands pytorch#81558 that got reverted due failing tests. This failure happened because of the test that I poorly designed. [The loop here](https://github.com/pytorch/pytorch/pull/81558/files#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3837) is doing `cache_enabled=False` and then `cache_enabled=True`. By doing this loop the graph from previous iteration (case `False`) conflicts with the next one (case `True`). I redesigned the test such that it does not do any loops. The new test does separate function calls with different argument values. Pull Request resolved: pytorch#81896 Approved by: https://github.com/ngimel

…es to pass (#1144) [Re-land] [CUDA graphs] Clear autocast amp cache (pytorch#81896) Re-lands pytorch#81558 that got reverted due failing tests. This failure happened because of the test that I poorly designed. [The loop here](https://github.com/pytorch/pytorch/pull/81558/files#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3837) is doing `cache_enabled=False` and then `cache_enabled=True`. By doing this loop the graph from previous iteration (case `False`) conflicts with the next one (case `True`). I redesigned the test such that it does not do any loops. The new test does separate function calls with different argument values. Pull Request resolved: pytorch#81896 Approved by: https://github.com/ngimel Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com>

fix tests

f30e38d

facebook-github-bot added the cla signed label Jul 21, 2022

pytorchbot added the open source label Jul 21, 2022

soulitzer requested review from ngimel and mcarilli July 22, 2022 01:19

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 22, 2022

Aidyn-A added 2 commits August 1, 2022 08:13

Merge branch 'pytorch:master' into reland_fix_make_graphed_callables_…

e4257f7

…with_amp

Merge branch 'pytorch:master' into reland_fix_make_graphed_callables_…

26d7e13

…with_amp

ngimel approved these changes Aug 2, 2022

View reviewed changes

pytorchmergebot added the Merged label Aug 2, 2022

pytorchmergebot closed this in da0a3fe Aug 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Re-land] [CUDA graphs] Clear autocast amp cache #81896

[Re-land] [CUDA graphs] Clear autocast amp cache #81896

Aidyn-A commented Jul 21, 2022 •

edited

Loading

facebook-github-bot commented Jul 21, 2022 •

edited

Loading

ngimel commented Aug 2, 2022

pytorchmergebot commented Aug 2, 2022

pytorchmergebot commented Aug 2, 2022

Aidyn-A commented Aug 2, 2022

ngimel commented Aug 2, 2022

pytorchmergebot commented Aug 2, 2022

github-actions bot commented Aug 2, 2022

[Re-land] [CUDA graphs] Clear autocast amp cache #81896

[Re-land] [CUDA graphs] Clear autocast amp cache #81896

Conversation

Aidyn-A commented Jul 21, 2022 • edited Loading

facebook-github-bot commented Jul 21, 2022 • edited Loading

🔗 Helpful links

✅ No Failures (0 Pending)

ngimel commented Aug 2, 2022

pytorchmergebot commented Aug 2, 2022

pytorchmergebot commented Aug 2, 2022

Aidyn-A commented Aug 2, 2022

ngimel commented Aug 2, 2022

pytorchmergebot commented Aug 2, 2022

github-actions bot commented Aug 2, 2022

Aidyn-A commented Jul 21, 2022 •

edited

Loading

facebook-github-bot commented Jul 21, 2022 •

edited

Loading