Support setting the trainer reference recursively for ensembles #13638

carmocca · 2022-07-13T15:34:27Z

What does this PR do?

Fixes #13146

Changes:

Recursively sets the Trainer reference for LightningModules
Uses a weak reference for the Trainer
Disambiguates the trainer attribute (optional) from the property getter (non-optional)
~~The same change was done to the Loop.trainer property for consistency~~ edit: it breaks the spawn queue
Updates codebase accordingly

Does your PR introduce any breaking changes? If yes, please list them.

model.trainer will now raise a RuntimeError if it hasn't been set.

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
[n/a] Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

cc @Borda @carmocca @justusschock @awaelchli @ananthsub @ninginthecloud @jjenniferdai @rohitgr7 @akihironitta

justusschock

I like it!

src/pytorch_lightning/core/module.py

for more information, see https://pre-commit.ci

src/pytorch_lightning/core/module.py

for more information, see https://pre-commit.ci

codecov · 2022-07-20T12:01:22Z

Codecov Report

Merging #13638 (f177ac1) into master (9a240b6) will increase coverage by 28%.
The diff coverage is 95%.

@@            Coverage Diff            @@
##           master   #13638     +/-   ##
=========================================
+ Coverage      49%      76%    +28%     
=========================================
  Files         327      327             
  Lines       25492    25547     +55     
=========================================
+ Hits        12452    19509   +7057     
+ Misses      13040     6038   -7002

src/pytorch_lightning/core/module.py

otaj · 2022-10-21T08:54:45Z

It seems this PR introduced a failing test, which somehow haven't failed before.
https://github.com/Lightning-AI/lightning/actions/runs/3293585250/jobs/5434973295

The issue in the failing test is that the trainer instance is already garbage collected, which can happen with weakrefs.

cc @carmocca

carmocca · 2022-10-21T11:52:25Z

Do you see it failing in different PRs? Do you suggest we remove the weakref, or that we ensure it doesn't get garbage collected in the test?

otaj · 2022-10-21T12:01:33Z

I haven't noticed it in other PRs yet. Could be, that something we introduced in the last week somewhat made it trigger, but I don't know have an idea of what it could be.

I think proper solution is to ensure it doesn't get garbage collected in the test, as I have a hard time imagining a situation in which a Trainer object gets garbage collected in real world use-case

carmocca · 2022-10-21T16:29:28Z

I'm just not sure what I should change to fix it. This variable should already hold the trainer reference: https://github.com/Lightning-AI/lightning/blob/0fb31ed614e73be903f9d8b339247bae24440566/tests/tests_pytorch/utilities/test_parsing.py#L67

otaj · 2022-10-24T07:53:43Z

Right, but that variable goes out of scope at the moment of returning from that function and is therefore free to get garbage-collected, since the only place where we have the reference is the weak reference. I think, what would help is to also return trainer instances from the model_cases function. I can do it myself.

carmocca added feature Is an improvement or enhancement lightningmodule pl.LightningModule labels Jul 13, 2022

carmocca added this to the pl:1.7 milestone Jul 13, 2022

carmocca self-assigned this Jul 13, 2022

carmocca marked this pull request as ready for review July 13, 2022 15:57

carmocca requested review from Borda, SeanNaren, awaelchli, justusschock, kaushikb11, rohitgr7, tchaton and williamFalcon as code owners July 13, 2022 15:57

justusschock approved these changes Jul 13, 2022

View reviewed changes

src/pytorch_lightning/core/module.py Outdated Show resolved Hide resolved

carmocca added 3 commits July 13, 2022 21:11

Support setting the trainer reference recursively for ensembles

8fc5347

Remove unnecessary assertions

cfd0a7a

Allow duck-typing

5ad5510

carmocca force-pushed the feat/recursive-trainer-setter branch 3 times, most recently from 405a6d7 to 65f22ad Compare July 13, 2022 19:27

carmocca commented Jul 13, 2022

View reviewed changes

src/pytorch_lightning/core/module.py Show resolved Hide resolved

carmocca force-pushed the feat/recursive-trainer-setter branch from 10a07fe to 249f29e Compare July 13, 2022 20:15

carmocca and others added 5 commits July 13, 2022 22:59

Weakref party

cf61a0f

Fix trainer usage accross codebase

ecb1b83

Fix trainer usage accross codebase

69c22aa

torchscript bug

89b0155

[pre-commit.ci] auto fixes from pre-commit.com hooks

24eadb5

for more information, see https://pre-commit.ci

carmocca force-pushed the feat/recursive-trainer-setter branch from 78efd97 to 24eadb5 Compare July 13, 2022 21:00

[pre-commit.ci] auto fixes from pre-commit.com hooks

bd672f1

for more information, see https://pre-commit.ci

carmocca commented Jul 13, 2022

View reviewed changes

src/pytorch_lightning/core/module.py Show resolved Hide resolved

pre-commit-ci bot and others added 3 commits July 19, 2022 00:31

[pre-commit.ci] auto fixes from pre-commit.com hooks

7909cbb

for more information, see https://pre-commit.ci

Fix

e4be816

Merge branch 'master' into feat/recursive-trainer-setter

b6fb773

mergify bot added has conflicts and removed ready PRs ready to be merged labels Jul 20, 2022

Merge branch 'master' into feat/recursive-trainer-setter

c1995de

mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Jul 20, 2022

github-actions bot added the pl Generic label for PyTorch Lightning package label Jul 20, 2022

mergify bot added the ready PRs ready to be merged label Jul 20, 2022

carmocca added 2 commits July 20, 2022 18:17

Merge branch 'master' into feat/recursive-trainer-setter

17cf912

Fix

0d6bd88

otaj approved these changes Jul 22, 2022

View reviewed changes

awaelchli approved these changes Jul 22, 2022

View reviewed changes

src/pytorch_lightning/core/module.py Show resolved Hide resolved

carmocca added 2 commits July 22, 2022 17:49

Suggestion

489eca7

Merge branch 'master' into feat/recursive-trainer-setter

f177ac1

Borda reviewed Jul 22, 2022

View reviewed changes

src/pytorch_lightning/core/module.py Show resolved Hide resolved

Borda approved these changes Jul 22, 2022

View reviewed changes

carmocca merged commit 9f51c07 into master Jul 22, 2022

carmocca deleted the feat/recursive-trainer-setter branch July 22, 2022 17:58

carmocca mentioned this pull request Jul 26, 2022

Fixes various typing errors in pytorch_lightning/strategies/deepspeed.py #13832

Merged

12 tasks

otaj mentioned this pull request Oct 24, 2022

Do not lose references of trainer in test #15272

Merged

neo-pan mentioned this pull request Jun 30, 2024

train和evaluate报错 Learning4Optimization-HUST/H-TSP#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support setting the trainer reference recursively for ensembles #13638

Support setting the trainer reference recursively for ensembles #13638

Uh oh!

carmocca commented Jul 13, 2022 •

edited by github-actions bot

Loading

Uh oh!

justusschock left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jul 20, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

otaj commented Oct 21, 2022

Uh oh!

carmocca commented Oct 21, 2022

Uh oh!

otaj commented Oct 21, 2022

Uh oh!

carmocca commented Oct 21, 2022

Uh oh!

otaj commented Oct 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Support setting the trainer reference recursively for ensembles #13638

Support setting the trainer reference recursively for ensembles #13638

Uh oh!

Conversation

carmocca commented Jul 13, 2022 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

PR review

Uh oh!

justusschock left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jul 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

otaj commented Oct 21, 2022

Uh oh!

carmocca commented Oct 21, 2022

Uh oh!

otaj commented Oct 21, 2022

Uh oh!

carmocca commented Oct 21, 2022

Uh oh!

otaj commented Oct 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

carmocca commented Jul 13, 2022 •

edited by github-actions bot

Loading

codecov bot commented Jul 20, 2022 •

edited

Loading