-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] Add {,load_}state_dict
to ResultCollection
1/n
#7948
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7948 +/- ##
======================================
- Coverage 92% 91% -0%
======================================
Files 207 207
Lines 13375 13464 +89
======================================
+ Hits 12245 12295 +50
- Misses 1130 1169 +39 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds a mechanism to reload from state_dict and restore logged values.
Could you describe more what is the problem we're trying to solve? Is this the only option for restoring logged values?
Relying on self.log
to checkpoint metric states is adding even more responsibilities when I think we should be going the other direction and moving responsibilities out of self.log. #7183 (comment)
Hey @ananthsub, Currently, the ResultCollection isn't fault tolerant. Problems:
In order to resolve this, this PR adds the following:
The restoration isn't implemented yet, it would be done in 2/n and 3/n PRs. The state dumping / restoration will be added to PR 2/n: Add PR 3/n: Add Note: The Best, |
Hey Ananth, Actually it might be possible to get rid of the attritube name. I will try this on Monday, which will reduce self.log responsability. Confirm I could remove it. |
pytorch_lightning/trainer/connectors/logger_connector/result.py
Outdated
Show resolved
Hide resolved
pytorch_lightning/trainer/connectors/logger_connector/result.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apart from the pickling concern LGTM
confidence low, not yet so familiar with new logger connector etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again low confidence but the plan makes sense and test for restoration looks good
{,load}_state_dict} to
ResultCollection` 1/n
{,load}_state_dict} to
ResultCollection` 1/n{,load}_state_dict}
to ResultCollection
1/n
2fccea4
to
68dac4a
Compare
{,load}_state_dict}
to ResultCollection
1/n{,load_}state_dict
to ResultCollection
1/n
This reverts commit 68dac4a.
pytorch_lightning/trainer/connectors/logger_connector/result.py
Outdated
Show resolved
Hide resolved
pytorch_lightning/trainer/connectors/logger_connector/result.py
Outdated
Show resolved
Hide resolved
7b95db5
to
e1c9893
Compare
What does this PR do?
Tracking Issue: #7898
This PR adds a mechanism to reload from state_dict and restore logged values.
Fixes #<issue_number>
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃