-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix min-epochs and early-stopping triggering too many validation runs #16719
Conversation
for more information, see https://pre-commit.ci
…o bugfix/early-stop-min-steps
⚡ Required checks status: All passing 🟢Groups summary🟢 pytorch_lightning: Tests workflow
These checks are required after the changes to 🟢 pytorch_lightning: Azure GPU
These checks are required after the changes to 🟢 pytorch_lightning: Azure HPU
These checks are required after the changes to 🟢 pytorch_lightning: Azure IPU
These checks are required after the changes to 🟢 pytorch_lightning: Docs
These checks are required after the changes to 🟢 mypy
These checks are required after the changes to 🟢 installThese checks are required after the changes to 🟢 link-check
These checks are required after the changes to Thank you for your contribution! 💜
|
…#16719) Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
* Add .git-blame-ignore-revs (#16709) Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> * Fix strategy type validation in connectors (#16693) * Disable strict loading in multiprocessing launcher (#16365) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz> * Fix min-epochs and early-stopping triggering too many validation runs (#16719) Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> * Update hydra-core requirement from <1.3.0,>=1.0.5 to >=1.0.5,<1.4.0 in /requirements (#16736) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [App] Add support for private data (#16738) Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local> * [App] Add rm one level below project level (#16740) Co-authored-by: Ethan Harris <ethanwharris@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com> Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local> * ci: cleaning caches (#16752) * CI: Update colossalai version (#16747) Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> type * Update version and changelog for 1.9.2 --------- Co-authored-by: Akihiro Nitta <nitta@akihironitta.com> Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka <jirka.borovec@seznam.cz> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: thomas chaton <thomas@grid.ai> Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local> Co-authored-by: Ethan Harris <ethanwharris@gmail.com> Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
What does this PR do?
Fixes #15708
There is an unfortunate interaction between the early stopping trigger mechanism and min_epochs not being reached, that then leads to the validation being triggered on every subsequent training step due to this condition here:
https://github.com/Lightning-AI/lightning/blob/5196eaa5264c7b95316718aa2d173dd42c5d9936/src/lightning/pytorch/loops/epoch/training_epoch_loop.py#L392-L394
This then manifests in a big runtime increase for subsequent epochs.
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
I made sure I had fun coding 🙃
cc @Borda @carmocca @awaelchli @justusschock