Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed extra dataloader bug #1196

Merged
merged 14 commits into from
Apr 2, 2020
Merged

fixed extra dataloader bug #1196

merged 14 commits into from
Apr 2, 2020

Conversation

TevenLeScao
Copy link
Contributor

What does this PR do?

Fixes #1181 (issue).

@codecov
Copy link

codecov bot commented Mar 19, 2020

Codecov Report

Merging #1196 into master will increase coverage by <1%.
The diff coverage is 91%.

@@           Coverage Diff           @@
##           master   #1196    +/-   ##
=======================================
+ Coverage      92%     92%   +<1%     
=======================================
  Files          62      62            
  Lines        3188    3191     +3     
=======================================
+ Hits         2920    2923     +3     
  Misses        268     268

@Borda Borda added the bug Something isn't working label Mar 20, 2020
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add CHNAGELOG note

pytorch_lightning/trainer/training_loop.py Show resolved Hide resolved
@ethanwharris
Copy link
Member

ethanwharris commented Mar 20, 2020

This looks good, it's not clear how the reload_dataloaders_every_epoch argument is supposed to work for val and test since we often don't do an epoch of those. But this fix is good for the train dataloader - perhaps the argument name should be changed to reset_train_dataloader_every_epoch? (and deprecate the old argument)

TevenLeScao and others added 3 commits March 20, 2020 11:29
@TevenLeScao
Copy link
Contributor Author

I've added a note in CHANGELOG.md

self.get_model() => model as it was already defined
@Borda Borda requested a review from ethanwharris March 20, 2020 15:00
CHANGELOG.md Outdated Show resolved Hide resolved
@Borda Borda added this to the 0.7.2 milestone Mar 20, 2020
@Borda Borda added the ready PRs ready to be merged label Mar 20, 2020
Copy link
Member

@ethanwharris ethanwharris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@TevenLeScao TevenLeScao requested a review from Borda March 20, 2020 15:42
@TevenLeScao
Copy link
Contributor Author

Sorry, I'm not sure what the next steps are at the moment - @williamFalcon , do you want to follow @ethanwharris 's suggestion of changing reset_dataloader_every_epoch to reset_train_dataloader_every_epoch ? I'm not sure I understand your last message.

@williamFalcon
Copy link
Contributor

@TevenLeScao yes, let's rename to reset_train_dataloader_every_epoch

@williamFalcon williamFalcon removed the ready PRs ready to be merged label Mar 24, 2020
@Borda Borda requested review from ethanwharris and Borda March 27, 2020 07:22
…xtra_dataloader_bug-fix

# Conflicts:
#	CHANGELOG.md
#	pytorch_lightning/trainer/training_loop.py
@mergify
Copy link
Contributor

mergify bot commented Mar 30, 2020

This pull request is now in conflict... :(

@@ -338,14 +338,14 @@ def run_evaluation(self, test_mode: bool = False):

# select dataloaders
if test_mode:
if self.reload_dataloaders_every_epoch or self.test_dataloaders is None:
if self.reload_train_dataloader_every_epoch or self.test_dataloaders is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong no? don’t we want to reset the test and val dataloaders with every call to evaluate?

Copy link
Contributor Author

@TevenLeScao TevenLeScao Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: I'm not sure what evaluation_loop is used for; why would we want to reload the test and/or val dataloader when it is called ? It doesn't strike me as an "every epoch" kind of thing.

Copy link
Contributor Author

@TevenLeScao TevenLeScao Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to keep that, maybe the compromise is to keep the name reload_dataloaders_every_epoch, and consider that this reloads the train dataloader every training epoch in training_loop, and the val/test dataloaders at every evaluation in evaluation_loop. This would fix the initial bug and keep all functionality the same and I feel like that should be the main objective here. I can just revert the last changes in this case, and the previously-approved PR should be good to go. Sorry if I'm reinventing the wheel here !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the point was just that the reload_train_dataloader_every_epoch was doing stuff with test and val when it wasn't needed - not to revert the change?

Copy link
Contributor Author

@TevenLeScao TevenLeScao Mar 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure anymore ! At least here I understand that @williamFalcon wants to reset them and @ethanwharris you want to call it reload_train_dataloader_every_epoch and not reset them.

But in any case I think it's better to have reload_dataloaders_every_epoch stand for everything, as it keeps previous functionality and doesn't split it into an argument for everything (ie train val and test)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evaluate runs the test and val loop.

Maybe i'm missing something, but if the user wants to reload the validation and test datasets every time evaluation is checked that should be allowed no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, I'm confused. Originally tbe point was made that reload_dataloaders_every_epoch didn't previously reload the test and val dataloaders, only train. So then I suggested the name change, but it turns out we do reload them? Anyway, let's make it so that reload_dataloaders_every_epoch does what it says on the tin and applies to val and test aswell, in which case I think the above comment from @tullie still applies. Can then revisit it if someone finds a use case where that doesn't work for them :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethanwharris is this also solved?

Copy link
Contributor Author

@TevenLeScao TevenLeScao Mar 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be fixed now, but the CI says some checks were cancelled and I'm not sure why :/

Edit: seems fine second time around

Copy link
Contributor

@williamFalcon williamFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check comments

@TevenLeScao
Copy link
Contributor Author

Should be good to go now !

@Borda
Copy link
Member

Borda commented Mar 31, 2020

@williamFalcon ^^

@TevenLeScao
Copy link
Contributor Author

So is there anything left to do here before merging ?

@Borda Borda dismissed williamFalcon’s stale review April 2, 2020 09:40

assume it was done...

@Borda Borda merged commit 04935ea into Lightning-AI:master Apr 2, 2020
alexeykarnachev pushed a commit to alexeykarnachev/pytorch-lightning that referenced this pull request Apr 3, 2020
* fixed extra dataloader bug

* Update pytorch_lightning/trainer/training_loop.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* updated CHANGELOG

* Small non-repetition change

self.get_model() => model as it was already defined

* Update CHANGELOG.md

* changed argument name to reload_train_dataloader_every_epoch

* fixed doc underline too short

* reverted to `reload_dataloaders_every_epoch`

* fixed val and test reloading

* fixed val and test reloading

Co-authored-by: TevenLeScao <teven.lescao@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
@Borda Borda modified the milestones: v0.7., v0.7.x Apr 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Additional dataloader created and discarded when training with reload_dataloaders_every_epoch
6 participants