-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Support checkpoint save and load with Stochastic Weight Averaging #9938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
94 commits
Select commit
Hold shift + click to select a range
72d0433
Save StochasticWeightAveraging callback data in checkpoints
adamreeve 3d2bf65
Add option to use SWA parameters during validation
adamreeve 1696273
Allow restoring SWA parameters to a model from a checkpoint
adamreeve c8db9d8
Refactor SWA batch norm moment update to work with validation
adamreeve 004959b
Add test for loading a model from a checkpoint with SWA parameters
adamreeve d76528b
Recompute batch norm moments when updating parameters from a checkpoint
adamreeve 0ea22e0
Handle when data batch is a list or tuple
adamreeve 01ca2a7
Save SWA scheduler step count in checkpoints
adamreeve 08d655b
Update SWA documentation and changelog
adamreeve 91ab357
Fix DeepSource code style issues
adamreeve 22e5d51
Revert SWA validation changes
adamreeve ed0a7f8
Merge remote-tracking branch 'upstream/master' into swa_checkpoint
adamreeve 11963f6
Fix resuming from epoch before SWA start and add extra test
adamreeve 226d8aa
Don't save state derived from constructor parameters into checkpoints
adamreeve 9ecc417
Merge branch 'master' into swa_checkpoint
tchaton 5d03d96
Tidy ups from code review
adamreeve 02a04da
Fix handling of n_averaged checkpoint data with multiple processes
adamreeve 8af5b56
Merge remote-tracking branch 'upstream/master' into swa_checkpoint
adamreeve db9590c
Merge branch 'master' into swa_checkpoint
tchaton 5763e05
Fix deprecation warning in test
adamreeve d46be83
Remove check for non-empty callback state in checkpoint
adamreeve e0fd0cb
Raise MisconfigurationException when using SWA with sharded models
adamreeve 2a83f05
Fix test failure with torch 1.7
adamreeve 4a8d81c
Fix crash when fairscale isn't installed
adamreeve dab0ef4
Skip segfaulting test under pytorch < 1.8
adamreeve a0d52c8
Changelog merge fix
adamreeve cdf4734
Remove unnecessary intermediate variable
adamreeve ba5b8ab
Fix checking for sharded plugins
adamreeve d2bb0ad
Don't raise an error for DDPSharded and DDPSpawnSharded with SWA
adamreeve 2c35328
Merge remote-tracking branch 'upstream/master' into swa_checkpoint
adamreeve ffcf011
Fix incorrect multiple context manager syntax for Python < 3.9
adamreeve c278034
Merge remote-tracking branch 'upstream/master' into swa_checkpoint
adamreeve d2fbe04
Merge branch 'master' into swa_checkpoint
adamreeve f13abf9
Merge branch 'master' into swa_checkpoint
adamreeve 8e848dc
Code review tidy up and fix CHANGELOG merge error
adamreeve 11757d5
Add a warning with initializing SWA after start but without checkpoin…
adamreeve 50d525f
Merge branch 'master' into swa_checkpoint
adamreeve 119f9b9
Merge branch 'master' into swa_checkpoint
adamreeve e332a42
Merge branch 'master' into swa_checkpoint
adamreeve fd59c41
Fixes to account for changes merged from master
adamreeve fe62b55
Merge branch 'master' into swa_checkpoint
adamreeve 440c4b6
Merge branch 'master' into swa_checkpoint
adamreeve b10261e
Fix SWA scheduler not being stepped
adamreeve 5bc9bee
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 8b9c624
Merge branch 'master' into swa_checkpoint
adamreeve 72c0242
Merge branch 'master' into swa_checkpoint
adamreeve 4dfb0df
Merge branch 'master' into swa_checkpoint
awaelchli 9b5fbfc
mark test helper protected
awaelchli 8e0c255
avoid warning for find_unused_parameters
awaelchli 6bb52ba
Merge branch 'master' into swa_checkpoint
adamreeve c44279f
Use _LRScheduler.state_dict/load_state_dict instead of accessing priv…
adamreeve b3eee59
Add test to reproduce crash when resuming with SWA and a custom sched…
adamreeve 0107ff1
Prevent trying to restore scheduler state into the wrong type of sche…
adamreeve 8067144
Merge branch 'master' into swa_checkpoint
adamreeve 81ac195
Add test case where trainer.strategy.restore_checkpoint_after_setup i…
adamreeve 20393b1
Minor test refactoring
carmocca 14f9f20
Fix test_swa_resume_training_from_checkpoint[2]
carmocca c677141
Did not mean to remove this
carmocca 5cf5e1b
Test tidy up from PR review comments
adamreeve fe79d6c
Store most recent update epoch in the SWA checkpoint data
adamreeve d799a62
Merge branch 'master' into swa_checkpoint
adamreeve c7c2818
Fix for master change that broke resuming without validation dataloaders
adamreeve d2ed468
Adjust SWA tests to account for current checkpoint resume behaviour
adamreeve a2143a8
Merge branch 'master' into swa_checkpoint
adamreeve 00328e8
Merge branch 'master' into swa_checkpoint
adamreeve 5dbfc2d
Merge branch 'master' into swa_checkpoint
adamreeve b71b690
Revert workarounds for first epoch after resume having no batches
adamreeve 15e6334
Use state_dict/load_state_dict instead of on_save/load_checkpoint in SWA
adamreeve e3104bc
Remove unnecessary workaround for handling restore_checkpoint_after_s…
adamreeve 6e9fbba
Merge branch 'master' into swa_checkpoint
adamreeve 08eecbb
Merge branch 'master' into swa_checkpoint
krshrimali 1e9dc33
Merge branch 'master' into swa_checkpoint
adamreeve f509178
Fix deprecation warning in tests
adamreeve 0388aea
Merge branch 'master' into swa_checkpoint
adamreeve f7594d6
Merge branch 'master' into swa_checkpoint
Borda cb6ce90
Merge branch 'master' into swa_checkpoint
Borda ddcb607
Merge branch 'master' into swa_checkpoint
awaelchli 77f137c
update runif
awaelchli 324499e
Remove no-longer required minimum torch version from test
adamreeve ab8aca0
Remove redundant None check that could hide a bug
adamreeve 7d6e7a8
Don't save scheduler configs as they will only be overridden
adamreeve 9bf237e
Use state_dict/load_state_dict to save and load average model state
adamreeve a9b6334
Parametrize misconfiguration error tests
adamreeve c24522b
Remove DummyError and match exception message
adamreeve b6b7db9
Merge remote-tracking branch 'upstream/master' into swa_checkpoint
adamreeve ba7cb5e
Fix state dict key
adamreeve 8bde4f4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 3ed8ea4
Type checking fixes
adamreeve 085bb4a
Merge remote-tracking branch 'upstream/master' into swa_checkpoint
adamreeve afba59d
Merge branch 'master' into swa_checkpoint
carmocca 807fadf
Merge branch 'master' into swa_checkpoint
awaelchli 15fe88e
fix changelog conflicts
awaelchli dcf5fea
Merge branch 'master' into swa_checkpoint
rohitgr7 ce9bcea
Merge branch 'master' into swa_checkpoint
awaelchli File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.