-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed deadlock issue related with MMDetWandbHook #9476
Conversation
There was already a similar commit on mmcls open-mmlab/mmpretrain#1242, which I didn't know! (this commit was only 5 days ago, too.) Also I guess we don't need to separate |
Codecov ReportBase: 64.15% // Head: 64.14% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## dev #9476 +/- ##
==========================================
- Coverage 64.15% 64.14% -0.02%
==========================================
Files 361 361
Lines 29583 29585 +2
Branches 5033 5033
==========================================
- Hits 18980 18977 -3
- Misses 9599 9601 +2
- Partials 1004 1007 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
0e22252
to
68dc5a6
Compare
Co-authored-by: WangYudong <yudong.wang@akane.waseda.jp>
68dc5a6
to
36680cc
Compare
I've verified my commit, and merged two commit in one. plz check and merge this. |
Co-authored-by: WangYudong <yudong.wang@akane.waseda.jp> Co-authored-by: WangYudong <yudong.wang@akane.waseda.jp>
Co-authored-by: WangYudong yudong.wang@akane.waseda.jp
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Please refer to #8145.
To summarize problem, when we use
MMDetWandbHook
afterTextLoggerHook
, problem such as deadlock orAssertionError: loss log variables are different across GPUs!
occurs. These are because of same reason and I summarized the reason in #8145 (comment) here.Since same problem occurs in mmseg open-mmlab/mmsegmentation#2137, I'm going to make almost-same PR on mmseg after merge of this PR.
Modification
Made MMDetWandbHook to clear
runner.log_buffer
on every process, including process for GPU 1,2,3...BC-breaking (Optional)
Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.
Checklist