-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Fix the progress bar for the sanity check #2892
[WIP] Fix the progress bar for the sanity check #2892
Conversation
Hello @manipopopo! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-08-14 17:35:58 UTC |
Codecov Report
@@ Coverage Diff @@
## master #2892 +/- ##
======================================
Coverage 85% 85%
======================================
Files 82 82
Lines 7719 7719
======================================
Hits 6550 6550
Misses 1169 1169 |
This pull request is now in conflict... :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good fix!
self.val_progress_bar.total = sum( | ||
min(trainer.num_sanity_val_steps, len(d) if has_len(d) else float('inf')) for d in trainer.val_dataloaders | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@awaelchli num_sanity_val_steps
should be independent of limit_val_batches
(float)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if num_sanity_val_steps=2
, len(val_dataloader)=10
and limit_val_batches=0.1
, should it run for 2 val_steps or 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this relevant here? I thought this pr is just about displaying the num_sanity steps that the trainer returns.
if limit_val_batches is used, it should just truncate the sanity steps if needed, no? This should happen in the trainer I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it still has some issues with limit_val_batches and I think a better fix would be to set up num_sanity_val_steps as a list in Trainer itself rather than doing it here, and simple we can do a sum to get total sanity val steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that means
- When
num_sanity_val_steps == -1
:
https://github.com/PyTorchLightning/pytorch-lightning/blob/0097630a95bddc48d6fb5d3b9a58aef2e8e89b22/tests/trainer/test_trainer.py#L802-L813limit_val_batches != 0
: runlen(val_dataloader)
(could beinf
) steps. (independent)limit_val_batches == 0
, it shouldn't run any step. (dependent)
- When
num_sanity_val_steps >= 0
, the number of check steps should be affected bylimit_val_batches
.(dependent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest in case of num_sanity_val_steps == -1
it should be affected by limit_val_batches
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohitgr7 I like your suggestions. It is true, the trainer should compute these properties and the progress bars should only read them (and maybe sum them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I open another PR or keep this PR going? Should we use the same num_sanity_val_steps
to save these values? (#2891 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am already working on it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
21a1489
to
8920d69
Compare
This pull request is now in conflict... :( |
7f8751a
to
ed25a6b
Compare
Great job! =) |
1 similar comment
Great job! =) |
The original progress bar will always show trainer.num_sanity_val_steps even if the length of the validation DataLoader is less than trainer.num_sanity_val_steps. The pytorch_lightning.trainer.data_loading._has_len is changed to a public function has_len, which is called by pytorch_lightning/callbacks/progress.py
e691bcc
to
117c1fe
Compare
This pull request is now in conflict... :( |
@@ -293,7 +294,9 @@ def init_test_tqdm(self) -> tqdm: | |||
def on_sanity_check_start(self, trainer, pl_module): | |||
super().on_sanity_check_start(trainer, pl_module) | |||
self.val_progress_bar = self.init_sanity_tqdm() | |||
self.val_progress_bar.total = convert_inf(trainer.num_sanity_val_steps * len(trainer.val_dataloaders)) | |||
self.val_progress_bar.total = sum( | |||
min(trainer.num_sanity_val_steps, len(d) if has_len(d) else float('inf')) for d in trainer.val_dataloaders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a quite common case, can't we add a function for this like
def len_or_default(to_be_checked: Any, default_length: int = int('inf')):
if has_len(to_be_checked):
return len(to_be_checked)
return default_length
This may be an overhead now, but we really need similar things quite often
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is a repeated code here. This is already done in reset_val_dataloader. All we need is just to sum num_sanity_val_steps here once #2917 is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with both of you. should we block this PR with 2917 or the other way around? Does it matter which one goes first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest block this one. Once I get some answers there I asked, I'll fix that one tonight and then we can complete this one :)
Great job! =) |
This pull request is now in conflict... :( |
This pull request is now in conflict... :( |
@manipopopo I think now we can finish this :) |
@manipopopo closing this then |
The original progress bar will always show
trainer.num_sanity_val_steps
asval_progress_bar.total
even if the length of the validationDataLoader
is less thantrainer.num_sanity_val_steps
.The
pytorch_lightning.trainer.data_loading._has_len
is changed to a public functionhas_len
, which is called bypytorch_lightning.callbacks.progress.ProgressBar
.Import
pytorch_lightning.trainer.data_loading
frompytorch_lightning.callbacks.progress
will lead to circular imports.Maybe we could move
pytorch_lightning.trainer.data_loading._has_len
to other place.Or we could save the sizes of validation (and train)
DataLoader
s as members ofTrainer
, which may be accessed bypytorch_lightning.callbacks.progress.ProgressBar
.What does this PR do?
Fixes #2891
Before submitting