Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to PyTorch 1.6 in Drone CI #4393

Merged
merged 36 commits into from
Nov 3, 2020
Merged

Switch to PyTorch 1.6 in Drone CI #4393

merged 36 commits into from
Nov 3, 2020

Conversation

ydcjeff
Copy link
Contributor

@ydcjeff ydcjeff commented Oct 27, 2020

What does this PR do?

Follow up of #3658

Fixes #4484

Related: https://app.circleci.com/pipelines/github/pytorch/pytorch-integration-testing/74/workflows/4a7bc96a-c550-4f0e-8c51-eb9d9af6166c/jobs/451/tests

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@codecov
Copy link

codecov bot commented Oct 27, 2020

Codecov Report

Merging #4393 into master will increase coverage by 0%.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #4393   +/-   ##
======================================
  Coverage      92%     93%           
======================================
  Files         116     116           
  Lines        8716    8717    +1     
======================================
+ Hits         8060    8082   +22     
+ Misses        656     635   -21     

@ydcjeff
Copy link
Contributor Author

ydcjeff commented Oct 27, 2020

1.7 is causing the failed tests.

@ydcjeff ydcjeff added the ci Continuous Integration label Oct 27, 2020
@ydcjeff ydcjeff added this to the 1.0.x milestone Oct 27, 2020
@ydcjeff ydcjeff changed the title Switch to PyTorch 1.6 in Drone CI Switch to PyTorch 1.6 in Drone CI [ci skip] Oct 28, 2020
@ydcjeff ydcjeff changed the title Switch to PyTorch 1.6 in Drone CI [ci skip] Switch to PyTorch 1.6 in Drone CI Oct 30, 2020
@ydcjeff ydcjeff changed the base branch from master to issue_4311 October 31, 2020 10:13
@ydcjeff ydcjeff changed the base branch from issue_4311 to master October 31, 2020 10:17
@Borda
Copy link
Member

Borda commented Nov 2, 2020

@ydcjeff we should be good to go right? We'll wait for the build to finish to verify. The next step is to get PyTorch 1.7, but we should probably handle this in a separate PR.

well there is one PR to run PT 1.6 and 1.7 by @ydcjeff

@ydcjeff
Copy link
Contributor Author

ydcjeff commented Nov 3, 2020

GPU tests are failing...

@s-rog
Copy link
Contributor

s-rog commented Nov 3, 2020

@Borda Borda requested review from Borda and SeanNaren November 3, 2020 11:49
@ydcjeff
Copy link
Contributor Author

ydcjeff commented Nov 3, 2020

The failing amp test is scaler.update() is never called as manual_backward() only does .scale(), .backward() and .unscale_()
but scaler.update() should be called after all optimizers have called .step() before calling another .unscale_()

@SeanNaren
Copy link
Contributor

SeanNaren commented Nov 3, 2020

Hey @ydcjeff this bug was introduced in #4441, I'm going to cc @tchaton to have a look.

@tchaton it seems that after moving the scaler to the accelerator, we're getting a failing test:

https://github.com/PyTorchLightning/pytorch-lightning/blob/master/tests/trainer/optimization/test_manual_optimization.py#L222

Could you have a look? Or let us know how to fix it? I can confirm that reverting this PR fixes this test

@tchaton tchaton assigned tchaton and unassigned SeanNaren and ydcjeff Nov 3, 2020
@tchaton
Copy link
Contributor

tchaton commented Nov 3, 2020

Hey @ydcjeff this bug was introduced in #4441, I'm going to cc @tchaton to have a look.

@tchaton it seems that after moving the scaler to the accelerator, we're getting a failing test:

https://github.com/PyTorchLightning/pytorch-lightning/blob/master/tests/trainer/optimization/test_manual_optimization.py#L222

Could you have a look? Or let us know how to fix it? I can confirm that reverting this PR fixes this test

Hey @ydcjeff, sorry for creating a bug. I am on it :)

Best regards,
Thomas Chaton

@tchaton tchaton merged commit ee414d2 into master Nov 3, 2020
@tchaton tchaton deleted the ci/pt-1.6 branch November 3, 2020 18:01
Borda pushed a commit that referenced this pull request Nov 3, 2020
* switch to 1.6

* readme

* 1.7

* back to normal [ci skip]

* horovodrun --verbose

* try with apex

* add apex test

* change base

* description

* test with 1.7

* back to 1.6

* no gradient_clip_val

* re-add gradient_clip_val

* no amp

* temp skip torch.cuda.amp + horovod test

* Apply suggestion from code review

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* Fix formatting

* ddp

* Moved extended model outside of function to prevent pickling issue for drone

* typo

* resolve bug

* extract automatic_automization

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: chaton <thomas@grid.ai>
(cherry picked from commit ee414d2)
Borda pushed a commit that referenced this pull request Nov 4, 2020
* switch to 1.6

* readme

* 1.7

* back to normal [ci skip]

* horovodrun --verbose

* try with apex

* add apex test

* change base

* description

* test with 1.7

* back to 1.6

* no gradient_clip_val

* re-add gradient_clip_val

* no amp

* temp skip torch.cuda.amp + horovod test

* Apply suggestion from code review

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* Fix formatting

* ddp

* Moved extended model outside of function to prevent pickling issue for drone

* typo

* resolve bug

* extract automatic_automization

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: chaton <thomas@grid.ai>
(cherry picked from commit ee414d2)
rohitgr7 pushed a commit that referenced this pull request Nov 21, 2020
* switch to 1.6

* readme

* 1.7

* back to normal [ci skip]

* horovodrun --verbose

* try with apex

* add apex test

* change base

* description

* test with 1.7

* back to 1.6

* no gradient_clip_val

* re-add gradient_clip_val

* no amp

* temp skip torch.cuda.amp + horovod test

* Apply suggestion from code review

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* Fix formatting

* ddp

* Moved extended model outside of function to prevent pickling issue for drone

* typo

* resolve bug

* extract automatic_automization

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: chaton <thomas@grid.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update drone to pytorch 1.6
6 participants