-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix logging on rank 0 only #2425
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2425 +/- ##
======================================
Coverage 88% 88%
======================================
Files 69 69
Lines 5505 5527 +22
======================================
+ Hits 4866 4888 +22
Misses 639 639 |
there is no way to know... this is why we init the loggers in init and then set the rank in train. |
In my current approach I added a decorator on top of |
Hello @awaelchli! Thanks for updating this PR.
Comment last updated at 2020-06-30 21:47:43 UTC |
2fa87f6
to
34f90c7
Compare
any thoughts? |
@williamFalcon looks like this error is coming from rank 1 (from the stdout capture in the CI logs):
I'll need to dig into the code of this PR a bit to see what's the cause. |
ok awesome. maybe there's a property that we should be assigning on all ranks but now only assign on 0? |
@williamFalcon if you look further down in the errors you will find this
It's trying to os.path.join(..) path parts that are None. |
7d7088b
to
b08cdf6
Compare
ummmm, i'm now not sure if this is our bug haha since only the global_rank=0 should have anything about disk writes. Ok, let's put a TODO to fix these tests @tgaddair @awaelchli in a different PR? right now this is blocking our minor release. However, we need to take a look at this ASAP. In the meantime merging to master to test the overall changes to loggers before releasing. Thank you both for looking into this! excited to finally add long-awaited tests to move to a stable v 1.0.0 |
@williamFalcon I think you need to move the pytest.skip to |
@williamFalcon Horovod works differently than DDP, in that everything is run in parallel with Horovod (as opposed to only certain portions of the code). So in this case, likely something in this test that is only being executed on rank 0 for DDP is being executed on rank 1 for Horovod. Feel free to assign a ticket to me, and I can dig into this test further. |
What does this PR do?
Fixes #2131 and the issue with WandB logger discussed on slack.
On ddp loggers work like this:
self.logger.experiment.some_log_method
, but on rank > 0 it is a no-op.Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃