-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable strict loading in multiprocessing launcher #16365
Conversation
for more information, see https://pre-commit.ci
⚡ Required checks status: All passing 🟢Groups summary🟢 pytorch_lightning: Tests workflowThese checks are required after the changes to 🟢 pytorch_lightning: Azure GPU
These checks are required after the changes to 🟢 pytorch_lightning: Azure HPU
These checks are required after the changes to 🟢 pytorch_lightning: Azure IPU
These checks are required after the changes to 🟢 pytorch_lightning: Docs
These checks are required after the changes to 🟢 mypy
These checks are required after the changes to 🟢 installThese checks are required after the changes to Thank you for your contribution! 💜
|
@awaelchli Thank you for quickly implementing this! It is possible to install |
Try this: PACKAGE_NAME=pytorch pip install https://github.com/Lightning-AI/lightning/archive/refs/heads/master.zip -U |
This works, thank you! |
Although the solution itself doesn't work for my pyro model - the module ends up with no parameters. module # nn.Module that contain pyro model and pyro guide as attributes module.model and module.guide
training_plan = TrainingPlan(pyro_module=module, **plan_kwargs) # pl.LightningModule
trainer = Trainer(
max_epochs=max_epochs,
accelerator=accelerator,
devices=devices,
strategy=strategy,
**trainer_kwargs,
)
trainer.fit(training_plan, data_splitter)
module.state_dict().keys()
# no parameters listed Is there any point downstream of this change where the contents of |
This is probably because you don't create your layers at the time of instantiation, only later. You will always run into this limitation with the "ddp_spawn" strategy, it's a result of the design. In this case, you should choose |
I see, thanks for explaining! And there is no way to modify the strategy - e.g. run Callback setup before loading the parameters in the main process? |
If you want, you can always call the setup() method yourself in the main process: my_callback.setup("fit") # call so that layers exist after fit
my_model.setup("fit") # call so that layers exist after fit
trainer.fit(my_model, ...) But before falling back to this workaround, I suggest just using the regular ddp strategy. |
Thanks @awaelchli! Both solutions It appears that |
You can check
Call For further questions, please consider posting in the forum or if you find a bug, a new issue would be appreciated (since this topic here is about strict loading of weights). |
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz>
* Add .git-blame-ignore-revs (#16709) Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> * Fix strategy type validation in connectors (#16693) * Disable strict loading in multiprocessing launcher (#16365) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz> * Fix min-epochs and early-stopping triggering too many validation runs (#16719) Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> * Update hydra-core requirement from <1.3.0,>=1.0.5 to >=1.0.5,<1.4.0 in /requirements (#16736) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [App] Add support for private data (#16738) Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local> * [App] Add rm one level below project level (#16740) Co-authored-by: Ethan Harris <ethanwharris@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local> * ci: cleaning caches (#16752) * CI: Update colossalai version (#16747) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> type * Update version and changelog for 1.9.2 --------- Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local> Co-authored-by: Ethan Harris <ethanwharris@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
What does this PR do?
Fixes #14534
Sets
strict=False
when loading the state dict of the model back into the main process. The model in the main process may have a different architecture than the one trained in the worker processes:This is a limitation of this type of training with the "spawn" method. Since we don't know what the user will do with the model after
fit()
, the best we can do is load the weights that match.Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
I made sure I had fun coding 🙃
cc @Borda @justusschock @awaelchli