Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support setting the trainer reference recursively for ensembles #13638

Merged
merged 26 commits into from
Jul 22, 2022

Conversation

carmocca
Copy link
Contributor

@carmocca carmocca commented Jul 13, 2022

What does this PR do?

Fixes #13146

Changes:

  • Recursively sets the Trainer reference for LightningModules
  • Uses a weak reference for the Trainer
  • Disambiguates the trainer attribute (optional) from the property getter (non-optional)
  • The same change was done to the Loop.trainer property for consistency edit: it breaks the spawn queue
  • Updates codebase accordingly

Does your PR introduce any breaking changes? If yes, please list them.

model.trainer will now raise a RuntimeError if it hasn't been set.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • [n/a] Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

cc @Borda @carmocca @justusschock @awaelchli @ananthsub @ninginthecloud @jjenniferdai @rohitgr7 @akihironitta

@carmocca carmocca added feature Is an improvement or enhancement lightningmodule pl.LightningModule labels Jul 13, 2022
@carmocca carmocca added this to the pl:1.7 milestone Jul 13, 2022
@carmocca carmocca self-assigned this Jul 13, 2022
@carmocca carmocca marked this pull request as ready for review July 13, 2022 15:57
Copy link
Member

@justusschock justusschock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

@carmocca carmocca force-pushed the feat/recursive-trainer-setter branch 3 times, most recently from 405a6d7 to 65f22ad Compare July 13, 2022 19:27
@carmocca carmocca force-pushed the feat/recursive-trainer-setter branch from 10a07fe to 249f29e Compare July 13, 2022 20:15
@carmocca carmocca force-pushed the feat/recursive-trainer-setter branch from 78efd97 to 24eadb5 Compare July 13, 2022 21:00
@mergify mergify bot added has conflicts and removed ready PRs ready to be merged labels Jul 20, 2022
@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Jul 20, 2022
@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Jul 20, 2022
@mergify mergify bot added the ready PRs ready to be merged label Jul 20, 2022
@codecov
Copy link

codecov bot commented Jul 20, 2022

Codecov Report

Merging #13638 (f177ac1) into master (9a240b6) will increase coverage by 28%.
The diff coverage is 95%.

@@            Coverage Diff            @@
##           master   #13638     +/-   ##
=========================================
+ Coverage      49%      76%    +28%     
=========================================
  Files         327      327             
  Lines       25492    25547     +55     
=========================================
+ Hits        12452    19509   +7057     
+ Misses      13040     6038   -7002     

@carmocca carmocca merged commit 9f51c07 into master Jul 22, 2022
@carmocca carmocca deleted the feat/recursive-trainer-setter branch July 22, 2022 17:58
@otaj
Copy link
Contributor

otaj commented Oct 21, 2022

It seems this PR introduced a failing test, which somehow haven't failed before.
https://github.com/Lightning-AI/lightning/actions/runs/3293585250/jobs/5434973295

The issue in the failing test is that the trainer instance is already garbage collected, which can happen with weakrefs.

cc @carmocca

@carmocca
Copy link
Contributor Author

Do you see it failing in different PRs? Do you suggest we remove the weakref, or that we ensure it doesn't get garbage collected in the test?

@otaj
Copy link
Contributor

otaj commented Oct 21, 2022

I haven't noticed it in other PRs yet. Could be, that something we introduced in the last week somewhat made it trigger, but I don't know have an idea of what it could be.

I think proper solution is to ensure it doesn't get garbage collected in the test, as I have a hard time imagining a situation in which a Trainer object gets garbage collected in real world use-case

@carmocca
Copy link
Contributor Author

I'm just not sure what I should change to fix it. This variable should already hold the trainer reference: https://github.com/Lightning-AI/lightning/blob/0fb31ed614e73be903f9d8b339247bae24440566/tests/tests_pytorch/utilities/test_parsing.py#L67

@otaj
Copy link
Contributor

otaj commented Oct 24, 2022

Right, but that variable goes out of scope at the moment of returning from that function and is therefore free to get garbage-collected, since the only place where we have the reference is the weak reference. I think, what would help is to also return trainer instances from the model_cases function. I can do it myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement lightningmodule pl.LightningModule pl Generic label for PyTorch Lightning package ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Registering logger/trainer inside nested PL models
6 participants