Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to remove checkpoint to allow override for extended classes #16067

Merged
merged 2 commits into from
Dec 15, 2022
Merged

Add function to remove checkpoint to allow override for extended classes #16067

merged 2 commits into from
Dec 15, 2022

Conversation

SeanNaren
Copy link
Contributor

@SeanNaren SeanNaren commented Dec 15, 2022

What does this PR do?

Related NVIDIA/NeMo#5631.

After speaking to @carmocca offline, we agreed that introducing this small refactor to the ModelCheckpoint class is fine. This would allow NeMoModelCheckpoint to override the deletion without too much additional intrusion.

Does your PR introduce any breaking changes? If yes, please list them.

None

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Dec 15, 2022
Copy link
Contributor

@carmocca carmocca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mergify mergify bot added the ready PRs ready to be merged label Dec 15, 2022
Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might forget about this, and then end up refactoring it again in the future since it's a protected method

@carmocca carmocca merged commit 10cc677 into Lightning-AI:master Dec 15, 2022
@SeanNaren SeanNaren deleted the fix/add_remove_fn branch December 15, 2022 16:51
Borda pushed a commit that referenced this pull request Dec 16, 2022
lantiga pushed a commit that referenced this pull request Dec 16, 2022
* Add function to remove checkpoint to allow override for extended classes (#16067)

(cherry picked from commit 10cc677)

* minor fix: indent spaces in comment-out (#16076)

(cherry picked from commit 385e5e2)

* ci: print existing candidates (#16077)

(cherry picked from commit 9e89aed)

* [App] Fix bug where previously deleted apps cannot be re-run from the CLI (#16082)

(cherry picked from commit 5f7403e)

* Better check for programmatic lightningignore (#16080)

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>

(cherry picked from commit b1ce263)

* [App] Removing single quote (#16079)

(cherry picked from commit 005b6f2)

* version 1.8.5.post0

* skip example test that relies on unreleased lite code

The examples use LightningLite syntax without the run method, which is only available in master

* fix can't instantiate abstract class


[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

fix

* skip bagua

Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Co-authored-by: Qiushi Pan <17402261+qqpann@users.noreply.github.com>
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Sherin Thomas <sherin@lightning.ai>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
lexierule pushed a commit that referenced this pull request Dec 20, 2022
* Remove the deprecated profiler imports (#16059)

* Revert "Load app before setting LIGHTNING_DISPATCHED" (#16064)

Revert "Load app before setting LIGHTNING_DISPATCHED (#16057)"

This reverts commit 8d3339a.

* [App] Hot fix: Resolve detection of python debugger (#16068)

Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Load the app before setting `LIGHTNING_DISPATCHED` (#16071)

* fix(cloud): detect and ignore venv (#16056)

Co-authored-by: Ethan Harris <ethanwharris@gmail.com>

* Add function to remove checkpoint to allow override for extended classes (#16067)

* Drop FairScale sharded parity tests (#16069)

* minor fix: indent spaces in comment-out (#16076)

* ci: print existing candidates (#16077)

* [App] Fix bug where previously deleted apps cannot be re-run from the CLI (#16082)

* Better check for programmatic lightningignore (#16080)

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>

* [App] Removing single quote (#16079)

* [App] PoC: Add support for Request (#16047)

* Have checkgroup pull the latest runs (#16033)

* Update Multinode Warning (#16091)

* [App] Serve datatypes with better client code (#16018)

* docs: add PT version (#16010)

* docs: add PT version

* stable

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add 1.13.1 to adjust versions (#16099)

* Remove redundant `find_unused_parameters=False` in Lite (#16026)

* [App] Add display name property to the work (#16095)

Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>

* Fix detection of whether app is running in cloud (#16045)

* [App] Add work.delete (#16103)

Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>

* [App] Improve the autoscaler UI (#16063)

[App] Improve the autoscaler UI (#16063)

* Re-enable Lite CLI on Windows + PyTorch 1.13 (#15645)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* [App] Min replica=0 would break autoscaler component (#16092)

* fixing the bug where num_replica=0 would fail

* changelog

* [App] Scale out/in interval for autoscaler (#16093)

* Adding arguments for scale out/in interval

* Tests

* Set the default work start method to spawn on MacOS (#16089)

* [App] Add status endpoint, enable `ready` (#16075)

Co-authored-by: thomas chaton <thomas@grid.ai>

* Clarify `work.stop()` limitation (#16073)

* fix merge errors

* Update torchvision requirement from <=0.14.0,>=0.11.1 to >=0.11.1,<0.15.0 in /requirements (#16108)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>

* CI: settle file names (#16098)

* CI: settle file names

* rename

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix test failing on master due to bad auto-merge (#16118)

* fix merge error

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: Yurij Mikhalevich <yurij@grid.ai>
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Co-authored-by: Qiushi Pan <17402261+qqpann@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: Sherin Thomas <sherin@lightning.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
carmocca added a commit that referenced this pull request Jan 4, 2023
* Remove the deprecated profiler imports (#16059)

* Revert "Load app before setting LIGHTNING_DISPATCHED" (#16064)

Revert "Load app before setting LIGHTNING_DISPATCHED (#16057)"

This reverts commit 8d3339a.

* [App] Hot fix: Resolve detection of python debugger (#16068)

Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Load the app before setting `LIGHTNING_DISPATCHED` (#16071)

* fix(cloud): detect and ignore venv (#16056)

Co-authored-by: Ethan Harris <ethanwharris@gmail.com>

* Add function to remove checkpoint to allow override for extended classes (#16067)

* Drop FairScale sharded parity tests (#16069)

* minor fix: indent spaces in comment-out (#16076)

* ci: print existing candidates (#16077)

* [App] Fix bug where previously deleted apps cannot be re-run from the CLI (#16082)

* Better check for programmatic lightningignore (#16080)

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>

* [App] Removing single quote (#16079)

* [App] PoC: Add support for Request (#16047)

* Have checkgroup pull the latest runs (#16033)

* Update Multinode Warning (#16091)

* [App] Serve datatypes with better client code (#16018)

* docs: add PT version (#16010)

* docs: add PT version

* stable

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add 1.13.1 to adjust versions (#16099)

* Remove redundant `find_unused_parameters=False` in Lite (#16026)

* [App] Add display name property to the work (#16095)

Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>

* Fix detection of whether app is running in cloud (#16045)

* [App] Add work.delete (#16103)

Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>

* [App] Improve the autoscaler UI (#16063)

[App] Improve the autoscaler UI (#16063)

* Re-enable Lite CLI on Windows + PyTorch 1.13 (#15645)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* [App] Min replica=0 would break autoscaler component (#16092)

* fixing the bug where num_replica=0 would fail

* changelog

* [App] Scale out/in interval for autoscaler (#16093)

* Adding arguments for scale out/in interval

* Tests

* Set the default work start method to spawn on MacOS (#16089)

* [App] Add status endpoint, enable `ready` (#16075)

Co-authored-by: thomas chaton <thomas@grid.ai>

* Clarify `work.stop()` limitation (#16073)

* fix merge errors

* Update torchvision requirement from <=0.14.0,>=0.11.1 to >=0.11.1,<0.15.0 in /requirements (#16108)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>

* CI: settle file names (#16098)

* CI: settle file names

* rename

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix test failing on master due to bad auto-merge (#16118)

* fix merge error

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: Yurij Mikhalevich <yurij@grid.ai>
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Co-authored-by: Qiushi Pan <17402261+qqpann@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: Sherin Thomas <sherin@lightning.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
carmocca added a commit that referenced this pull request Jan 4, 2023
* Remove the deprecated profiler imports (#16059)

* Revert "Load app before setting LIGHTNING_DISPATCHED" (#16064)

Revert "Load app before setting LIGHTNING_DISPATCHED (#16057)"

This reverts commit 8d3339a.

* [App] Hot fix: Resolve detection of python debugger (#16068)

Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Load the app before setting `LIGHTNING_DISPATCHED` (#16071)

* fix(cloud): detect and ignore venv (#16056)

Co-authored-by: Ethan Harris <ethanwharris@gmail.com>

* Add function to remove checkpoint to allow override for extended classes (#16067)

* Drop FairScale sharded parity tests (#16069)

* minor fix: indent spaces in comment-out (#16076)

* ci: print existing candidates (#16077)

* [App] Fix bug where previously deleted apps cannot be re-run from the CLI (#16082)

* Better check for programmatic lightningignore (#16080)

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>

* [App] Removing single quote (#16079)

* [App] PoC: Add support for Request (#16047)

* Have checkgroup pull the latest runs (#16033)

* Update Multinode Warning (#16091)

* [App] Serve datatypes with better client code (#16018)

* docs: add PT version (#16010)

* docs: add PT version

* stable

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* add 1.13.1 to adjust versions (#16099)

* Remove redundant `find_unused_parameters=False` in Lite (#16026)

* [App] Add display name property to the work (#16095)

Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>

* Fix detection of whether app is running in cloud (#16045)

* [App] Add work.delete (#16103)

Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>

* [App] Improve the autoscaler UI (#16063)

[App] Improve the autoscaler UI (#16063)

* Re-enable Lite CLI on Windows + PyTorch 1.13 (#15645)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* [App] Min replica=0 would break autoscaler component (#16092)

* fixing the bug where num_replica=0 would fail

* changelog

* [App] Scale out/in interval for autoscaler (#16093)

* Adding arguments for scale out/in interval

* Tests

* Set the default work start method to spawn on MacOS (#16089)

* [App] Add status endpoint, enable `ready` (#16075)

Co-authored-by: thomas chaton <thomas@grid.ai>

* Clarify `work.stop()` limitation (#16073)

* fix merge errors

* Update torchvision requirement from <=0.14.0,>=0.11.1 to >=0.11.1,<0.15.0 in /requirements (#16108)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>

* CI: settle file names (#16098)

* CI: settle file names

* rename

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix test failing on master due to bad auto-merge (#16118)

* fix merge error

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: thomas <thomas@thomass-MacBook-Pro.local>
Co-authored-by: Yurij Mikhalevich <yurij@grid.ai>
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Co-authored-by: Qiushi Pan <17402261+qqpann@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: Sherin Thomas <sherin@lightning.ai>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
callback: model checkpoint pl Generic label for PyTorch Lightning package ready PRs ready to be merged refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants