Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

signal handling and teardown #3632

Closed
wants to merge 14 commits into from
Closed

signal handling and teardown #3632

wants to merge 14 commits into from

Conversation

awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Sep 23, 2020

What does this PR do?

Continuation of #2165

Fixes #1999

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@awaelchli awaelchli added feature Is an improvement or enhancement bug Something isn't working labels Sep 23, 2020
@codecov
Copy link

codecov bot commented Sep 23, 2020

Codecov Report

Merging #3632 into master will increase coverage by 4%.
The diff coverage is 90%.

@@           Coverage Diff           @@
##           master   #3632    +/-   ##
=======================================
+ Coverage      83%     87%    +4%     
=======================================
  Files         117     118     +1     
  Lines        9106    9125    +19     
=======================================
+ Hits         7575    7983   +408     
+ Misses       1531    1142   -389     

@awaelchli awaelchli marked this pull request as ready for review September 23, 2020 23:09
@mergify mergify bot requested a review from a team September 23, 2020 23:10
@mergify mergify bot requested a review from a team September 24, 2020 07:41
@Borda Borda self-requested a review September 25, 2020 12:21
@mergify
Copy link
Contributor

mergify bot commented Sep 25, 2020

This pull request is now in conflict... :(

@mergify mergify bot requested a review from a team September 25, 2020 12:32
@Borda Borda requested a review from justusschock September 29, 2020 14:16
@mergify mergify bot requested a review from a team September 29, 2020 14:19
@mergify
Copy link
Contributor

mergify bot commented Oct 1, 2020

This pull request is now in conflict... :(

@mergify mergify bot requested a review from a team October 1, 2020 14:19
@williamFalcon
Copy link
Contributor

@awaelchli why do we want this again?
last time this caused all sorts of issues, can we skip this until 1.1?

this is also going to break DDP...


DDP

.fit
kill_process_group()

.test() <-------- fail because process group was killed...

the way i currently have it works fine and keeps the progress groups around.

This was that major bug we both spent a long time on.

Copy link
Contributor

@williamFalcon williamFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe this will break DDP.
This is something taht we should revisit after 1.0

@mergify mergify bot requested a review from a team October 5, 2020 01:03
@awaelchli
Copy link
Contributor Author

I'm sorry william, but this PR was before your refactor and fixes.
I have not tested it since.

@awaelchli awaelchli closed this Oct 5, 2020
@williamFalcon
Copy link
Contributor

williamFalcon commented Oct 5, 2020

no worries. i was going to let it run to see if tests did indeed fail.

Then we can decide what to do

@williamFalcon williamFalcon reopened this Oct 5, 2020
@Borda
Copy link
Member

Borda commented Oct 6, 2020

retrigger Drone to see how it runs on multi-gpu...

@mergify
Copy link
Contributor

mergify bot commented Oct 9, 2020

This pull request is now in conflict... :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature Is an improvement or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TypeError on run_training_teardown method on the master branch
7 participants